Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Liang Zhao

School of Computer Science, Shenyang Aerospace University

NK Hybrid Genetic Algorithm for Clustering

Feb 06, 2024

Renato Tinós, Liang Zhao, Francisco Chicano, Darrell Whitley

Abstract:The NK hybrid genetic algorithm for clustering is proposed in this paper. In order to evaluate the solutions, the hybrid algorithm uses the NK clustering validation criterion 2 (NKCV2). NKCV2 uses information about the disposition of $N$ small groups of objects. Each group is composed of $K+1$ objects of the dataset. Experimental results show that density-based regions can be identified by using NKCV2 with fixed small $K$. In NKCV2, the relationship between decision variables is known, which in turn allows us to apply gray box optimization. Mutation operators, a partition crossover, and a local search strategy are proposed, all using information about the relationship between decision variables. In partition crossover, the evaluation function is decomposed into $q$ independent components; partition crossover then deterministically returns the best among $2^q$ possible offspring with computational complexity $O(N)$. The NK hybrid genetic algorithm allows the detection of clusters with arbitrary shapes and the automatic estimation of the number of clusters. In the experiments, the NK hybrid genetic algorithm produced very good results when compared to another genetic algorithm approach and to state-of-art clustering algorithms.

* IEEE Trans. Evol. Comput. 22(5): 748-761 (2018)

Via

Access Paper or Ask Questions

3DPFIX: Improving Remote Novices' 3D Printing Troubleshooting through Human-AI Collaboration

Feb 02, 2024

Nahyun Kwon, Tong Sun, Yuyang Gao, Liang Zhao, Xu Wang, Jeeeun Kim, Sungsoo Ray Hong

Abstract:The widespread consumer-grade 3D printers and learning resources online enable novices to self-train in remote settings. While troubleshooting plays an essential part of 3D printing, the process remains challenging for many remote novices even with the help of well-developed online sources, such as online troubleshooting archives and online community help. We conducted a formative study with 76 active 3D printing users to learn how remote novices leverage online resources in troubleshooting and their challenges. We found that remote novices cannot fully utilize online resources. For example, the online archives statically provide general information, making it hard to search and relate their unique cases with existing descriptions. Online communities can potentially ease their struggles by providing more targeted suggestions, but a helper who can provide custom help is rather scarce, making it hard to obtain timely assistance. We propose 3DPFIX, an interactive 3D troubleshooting system powered by the pipeline to facilitate Human-AI Collaboration, designed to improve novices' 3D printing experiences and thus help them easily accumulate their domain knowledge. We built 3DPFIX that supports automated diagnosis and solution-seeking. 3DPFIX was built upon shared dialogues about failure cases from Q&A discourses accumulated in online communities. We leverage social annotations (i.e., comments) to build an annotated failure image dataset for AI classifiers and extract a solution pool. Our summative study revealed that using 3DPFIX helped participants spend significantly less effort in diagnosing failures and finding a more accurate solution than relying on their common practice. We also found that 3DPFIX users learn about 3D printing domain-specific knowledge. We discuss the implications of leveraging community-driven data in developing future Human-AI Collaboration designs.

* CSCW2024

Via

Access Paper or Ask Questions

Explaining latent representations of generative models with large multimodal models

Feb 02, 2024

Mengdan Zhu, Zhenke Liu, Bo Pan, Abhinav Angirekula, Liang Zhao

Figure 1 for Explaining latent representations of generative models with large multimodal models

Figure 2 for Explaining latent representations of generative models with large multimodal models

Figure 3 for Explaining latent representations of generative models with large multimodal models

Figure 4 for Explaining latent representations of generative models with large multimodal models

Abstract:Learning interpretable representations of data generative latent factors is an important topic for the development of artificial intelligence. With the rise of the large multimodal model, it can align images with text to generate answers. In this work, we propose a framework to comprehensively explain each latent factor in the generative models using a large multimodal model. We further measure the uncertainty of our generated explanations, quantitatively evaluate the performance of explanation generation among multiple large multimodal models, and qualitatively visualize the variations of each latent factor to learn the disentanglement effects of different generative models on explanations. Finally, we discuss the explanatory capabilities and limitations of state-of-the-art large multimodal models.

Via

Access Paper or Ask Questions

Small Language Model Meets with Reinforced Vision Vocabulary

Jan 23, 2024

Haoran Wei, Lingyu Kong, Jinyue Chen, Liang Zhao, Zheng Ge, En Yu, Jianjian Sun, Chunrui Han, Xiangyu Zhang

Abstract:Playing Large Vision Language Models (LVLMs) in 2023 is trendy among the AI community. However, the relatively large number of parameters (more than 7B) of popular LVLMs makes it difficult to train and deploy on consumer GPUs, discouraging many researchers with limited resources. Imagine how cool it would be to experience all the features of current LVLMs on an old GTX1080ti (our only game card). Accordingly, we present Vary-toy in this report, a small-size Vary along with Qwen-1.8B as the base ``large'' language model. In Vary-toy, we introduce an improved vision vocabulary, allowing the model to not only possess all features of Vary but also gather more generality. Specifically, we replace negative samples of natural images with positive sample data driven by object detection in the procedure of generating vision vocabulary, more sufficiently utilizing the capacity of the vocabulary network and enabling it to efficiently encode visual information corresponding to natural objects. For experiments, Vary-toy can achieve 65.6% ANLS on DocVQA, 59.1% accuracy on ChartQA, 88.1% accuracy on RefCOCO, and 29% on MMVet. The code will be publicly available on the homepage.

Via

Access Paper or Ask Questions

Gene-associated Disease Discovery Powered by Large Language Models

Jan 16, 2024

Jiayu Chang, Shiyu Wang, Chen Ling, Zhaohui Qin, Liang Zhao

Abstract:The intricate relationship between genetic variation and human diseases has been a focal point of medical research, evidenced by the identification of risk genes regarding specific diseases. The advent of advanced genome sequencing techniques has significantly improved the efficiency and cost-effectiveness of detecting these genetic markers, playing a crucial role in disease diagnosis and forming the basis for clinical decision-making and early risk assessment. To overcome the limitations of existing databases that record disease-gene associations from existing literature, which often lack real-time updates, we propose a novel framework employing Large Language Models (LLMs) for the discovery of diseases associated with specific genes. This framework aims to automate the labor-intensive process of sifting through medical literature for evidence linking genetic variations to diseases, thereby enhancing the efficiency of disease identification. Our approach involves using LLMs to conduct literature searches, summarize relevant findings, and pinpoint diseases related to specific genes. This paper details the development and application of our LLM-powered framework, demonstrating its potential in streamlining the complex process of literature retrieval and summarization to identify diseases associated with specific genetic variations.

* This is the official paper accepted by AAAI 2024 Workshop on Large Language Models for Biological Discoveries

Via

Access Paper or Ask Questions

Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models

Jan 04, 2024

Guangji Bai, Zheng Chai, Chen Ling, Shiyu Wang, Jiaying Lu, Nan Zhang, Tingwei Shi, Ziyang Yu, Mengdan Zhu, Yifei Zhang(+3 more)

Abstract:The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated models like OpenAI's ChatGPT, represents a significant advancement in artificial intelligence. These models, however, bring forth substantial challenges in the high consumption of computational, memory, energy, and financial resources, especially in environments with limited resource capabilities. This survey aims to systematically address these challenges by reviewing a broad spectrum of techniques designed to enhance the resource efficiency of LLMs. We categorize methods based on their optimization focus: computational, memory, energy, financial, and network resources and their applicability across various stages of an LLM's lifecycle, including architecture design, pretraining, finetuning, and system design. Additionally, the survey introduces a nuanced categorization of resource efficiency techniques by their specific resource types, which uncovers the intricate relationships and mappings between various resources and corresponding optimization techniques. A standardized set of evaluation metrics and datasets is also presented to facilitate consistent and fair comparisons across different models and techniques. By offering a comprehensive overview of the current sota and identifying open research avenues, this survey serves as a foundational reference for researchers and practitioners, aiding them in developing more sustainable and efficient LLMs in a rapidly evolving landscape.

* Preprint. GitHub repo: https://github.com/tiingweii-shii/Awesome-Resource-Efficient-LLM-Papers

Via

Access Paper or Ask Questions

Length Extrapolation of Transformers: A Survey from the Perspective of Position Encoding

Dec 29, 2023

Liang Zhao, Xiaocheng Feng, Xiachong Feng, Bing Qin, Ting Liu

Figure 1 for Length Extrapolation of Transformers: A Survey from the Perspective of Position Encoding

Figure 2 for Length Extrapolation of Transformers: A Survey from the Perspective of Position Encoding

Figure 3 for Length Extrapolation of Transformers: A Survey from the Perspective of Position Encoding

Figure 4 for Length Extrapolation of Transformers: A Survey from the Perspective of Position Encoding

Abstract:Transformer has taken the natural language processing (NLP) field by storm since birth, owing to its superior ability to model complex dependencies in sequences. Despite the great success of pretrained language models (PLMs) based on Transformer across almost all NLP tasks, they all suffer from a preset length limit and thus can hardly extend this success to longer sequences beyond seen data, namely the length extrapolation problem. Length extrapolation has aroused great interest among researchers, as it is the core feature of human language capacity. To enhance length extrapolation of Transformers, a plethora of methods have been proposed, mostly focusing on extrapolatable position encodings. In this article, we provide an organized and systematical review of these research efforts in a unified notation from a position encoding perspective, aiming to enable the reader to gain a deep understanding of existing methods and provide stimuli for future research.

* Work in progress

Via

Access Paper or Ask Questions

Prompt-based Domain Discrimination for Multi-source Time Series Domain Adaptation

Dec 19, 2023

Junxiang Wang, Guangji Bai, Wei Cheng, Zhengzhang Chen, Liang Zhao, Haifeng Chen

Figure 1 for Prompt-based Domain Discrimination for Multi-source Time Series Domain Adaptation

Figure 2 for Prompt-based Domain Discrimination for Multi-source Time Series Domain Adaptation

Figure 3 for Prompt-based Domain Discrimination for Multi-source Time Series Domain Adaptation

Figure 4 for Prompt-based Domain Discrimination for Multi-source Time Series Domain Adaptation

Abstract:Time series domain adaptation stands as a pivotal and intricate challenge with diverse applications, including but not limited to human activity recognition, sleep stage classification, and machine fault diagnosis. Despite the numerous domain adaptation techniques proposed to tackle this complex problem, their primary focus has been on the common representations of time series data. This concentration might inadvertently lead to the oversight of valuable domain-specific information originating from different source domains. To bridge this gap, we introduce POND, a novel prompt-based deep learning model designed explicitly for multi-source time series domain adaptation. POND is tailored to address significant challenges, notably: 1) The unavailability of a quantitative relationship between meta-data information and time series distributions, and 2) The dearth of exploration into extracting domain-specific meta-data information. In this paper, we present an instance-level prompt generator and a fidelity loss mechanism to facilitate the faithful learning of meta-data information. Additionally, we propose a domain discrimination technique to discern domain-specific meta-data information from multiple source domains. Our approach involves a simple yet effective meta-learning algorithm to optimize the objective efficiently. Furthermore, we augment the model's performance by incorporating the Mixture of Expert (MoE) technique. The efficacy and robustness of our proposed POND model are extensively validated through experiments across 50 scenarios encompassing five datasets, which demonstrates that our proposed POND model outperforms the state-of-the-art methods by up to $66\%$ on the F1-score.

* Undergoing work

Via

Access Paper or Ask Questions

Non-Euclidean Spatial Graph Neural Network

Dec 17, 2023

Zheng Zhang, Sirui Li, Jingcheng Zhou, Junxiang Wang, Abhinav Angirekula, Allen Zhang, Liang Zhao

Figure 1 for Non-Euclidean Spatial Graph Neural Network

Figure 2 for Non-Euclidean Spatial Graph Neural Network

Figure 3 for Non-Euclidean Spatial Graph Neural Network

Figure 4 for Non-Euclidean Spatial Graph Neural Network

Abstract:Spatial networks are networks whose graph topology is constrained by their embedded spatial space. Understanding the coupled spatial-graph properties is crucial for extracting powerful representations from spatial networks. Therefore, merely combining individual spatial and network representations cannot reveal the underlying interaction mechanism of spatial networks. Besides, existing spatial network representation learning methods can only consider networks embedded in Euclidean space, and can not well exploit the rich geometric information carried by irregular and non-uniform non-Euclidean space. In order to address this issue, in this paper we propose a novel generic framework to learn the representation of spatial networks that are embedded in non-Euclidean manifold space. Specifically, a novel message-passing-based neural network is proposed to combine graph topology and spatial geometry, where spatial geometry is extracted as messages on the edges. We theoretically guarantee that the learned representations are provably invariant to important symmetries such as rotation or translation, and simultaneously maintain sufficient ability in distinguishing different geometric structures. The strength of our proposed method is demonstrated through extensive experiments on both synthetic and real-world datasets.

* Accepted by SDM 2024

Via

Access Paper or Ask Questions

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

Dec 11, 2023

Haoran Wei, Lingyu Kong, Jinyue Chen, Liang Zhao, Zheng Ge, Jinrong Yang, Jianjian Sun, Chunrui Han, Xiangyu Zhang

Figure 1 for Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

Figure 2 for Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

Figure 3 for Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

Figure 4 for Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

Abstract:Modern Large Vision-Language Models (LVLMs) enjoy the same vision vocabulary -- CLIP, which can cover most common vision tasks. However, for some special vision task that needs dense and fine-grained vision perception, e.g., document-level OCR or chart understanding, especially in non-English scenarios, the CLIP-style vocabulary may encounter low efficiency in tokenizing the vision knowledge and even suffer out-of-vocabulary problem. Accordingly, we propose Vary, an efficient and effective method to scale up the vision vocabulary of LVLMs. The procedures of Vary are naturally divided into two folds: the generation and integration of a new vision vocabulary. In the first phase, we devise a vocabulary network along with a tiny decoder-only transformer to produce the desired vocabulary via autoregression. In the next, we scale up the vanilla vision vocabulary by merging the new one with the original one (CLIP), enabling the LVLMs can quickly garner new features. Compared to the popular BLIP-2, MiniGPT4, and LLaVA, Vary can maintain its vanilla capabilities while enjoying more excellent fine-grained perception and understanding ability. Specifically, Vary is competent in new document parsing features (OCR or markdown conversion) while achieving 78.2% ANLS in DocVQA and 36.2% in MMVet. Our code will be publicly available on the homepage.

Via

Access Paper or Ask Questions