Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhiqiang Hu

SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

Jul 29, 2024

Wenxuan Zhang, Hou Pong Chan, Yiran Zhao, Mahani Aljunied, Jianyu Wang, Chaoqun Liu, Yue Deng, Zhiqiang Hu, Weiwen Xu, Yew Ken Chia(+2 more)

Figure 1 for SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

Figure 2 for SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

Figure 3 for SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

Figure 4 for SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

Abstract:Large Language Models (LLMs) have shown remarkable abilities across various tasks, yet their development has predominantly centered on high-resource languages like English and Chinese, leaving low-resource languages underserved. To address this disparity, we present SeaLLMs 3, the latest iteration of the SeaLLMs model family, tailored for Southeast Asian languages. This region, characterized by its rich linguistic diversity, has lacked adequate language technology support. SeaLLMs 3 aims to bridge this gap by covering a comprehensive range of languages spoken in this region, including English, Chinese, Indonesian, Vietnamese, Thai, Tagalog, Malay, Burmese, Khmer, Lao, Tamil, and Javanese. Leveraging efficient language enhancement techniques and a specially constructed instruction tuning dataset, SeaLLMs 3 significantly reduces training costs while maintaining high performance and versatility. Our model excels in tasks such as world knowledge, mathematical reasoning, translation, and instruction following, achieving state-of-the-art performance among similarly sized models. Additionally, we prioritized safety and reliability by addressing both general and culture-specific considerations and incorporated mechanisms to reduce hallucinations. This work underscores the importance of inclusive AI, showing that advanced LLM capabilities can benefit underserved linguistic and cultural communities.

Via

Access Paper or Ask Questions

InstructAV: Instruction Fine-tuning Large Language Models for Authorship Verification

Jul 16, 2024

Yujia Hu, Zhiqiang Hu, Chun-Wei Seah, Roy Ka-Wei Lee

Figure 1 for InstructAV: Instruction Fine-tuning Large Language Models for Authorship Verification

Figure 2 for InstructAV: Instruction Fine-tuning Large Language Models for Authorship Verification

Figure 3 for InstructAV: Instruction Fine-tuning Large Language Models for Authorship Verification

Figure 4 for InstructAV: Instruction Fine-tuning Large Language Models for Authorship Verification

Abstract:Large Language Models (LLMs) have demonstrated remarkable proficiency in a wide range of NLP tasks. However, when it comes to authorship verification (AV) tasks, which involve determining whether two given texts share the same authorship, even advanced models like ChatGPT exhibit notable limitations. This paper introduces a novel approach, termed InstructAV, for authorship verification. This approach utilizes LLMs in conjunction with a parameter-efficient fine-tuning (PEFT) method to simultaneously improve accuracy and explainability. The distinctiveness of InstructAV lies in its ability to align classification decisions with transparent and understandable explanations, representing a significant progression in the field of authorship verification. Through comprehensive experiments conducted across various datasets, InstructAV demonstrates its state-of-the-art performance on the AV task, offering high classification accuracy coupled with enhanced explanation reliability.

Via

Access Paper or Ask Questions

Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

Jun 26, 2024

Wenhao Shi, Zhiqiang Hu, Yi Bin, Junhua Liu, Yang Yang, See-Kiong Ng, Lidong Bing, Roy Ka-Wei Lee

Figure 1 for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

Figure 2 for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

Figure 3 for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

Figure 4 for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

Abstract:Large language models (LLMs) have demonstrated impressive reasoning capabilities, particularly in textual mathematical problem-solving. However, existing open-source image instruction fine-tuning datasets, containing limited question-answer pairs per image, do not fully exploit visual information to enhance the multimodal mathematical reasoning capabilities of Multimodal LLMs (MLLMs). To bridge this gap, we address the lack of high-quality, diverse multimodal mathematical datasets by collecting 40K high-quality images with question-answer pairs from 24 existing datasets and synthesizing 320K new pairs, creating the MathV360K dataset, which enhances both the breadth and depth of multimodal mathematical questions. We introduce Math-LLaVA, a LLaVA-1.5-based model fine-tuned with MathV360K. This novel approach significantly improves the multimodal mathematical reasoning capabilities of LLaVA-1.5, achieving a 19-point increase and comparable performance to GPT-4V on MathVista's minitest split. Furthermore, Math-LLaVA demonstrates enhanced generalizability, showing substantial improvements on the MMMU benchmark. Our research highlights the importance of dataset diversity and synthesis in advancing MLLMs' mathematical reasoning abilities. The code and data are available at: \url{https://github.com/HZQ950419/Math-LLaVA}.

* 8 pages

Via

Access Paper or Ask Questions

All in a Single Image: Large Multimodal Models are In-Image Learners

Feb 28, 2024

Lei Wang, Wanyu Xu, Zhiqiang Hu, Yihuai Lan, Shan Dong, Hao Wang, Roy Ka-Wei Lee, Ee-Peng Lim

Figure 1 for All in a Single Image: Large Multimodal Models are In-Image Learners

Figure 2 for All in a Single Image: Large Multimodal Models are In-Image Learners

Figure 3 for All in a Single Image: Large Multimodal Models are In-Image Learners

Figure 4 for All in a Single Image: Large Multimodal Models are In-Image Learners

Abstract:This paper introduces a new in-context learning (ICL) mechanism called In-Image Learning (I$^2$L) that combines demonstration examples, visual cues, and instructions into a single image to enhance the capabilities of GPT-4V. Unlike previous approaches that rely on converting images to text or incorporating visual input into language models, I$^2$L consolidates all information into one image and primarily leverages image processing, understanding, and reasoning abilities. This has several advantages: it avoids inaccurate textual descriptions of complex images, provides flexibility in positioning demonstration examples, reduces the input burden, and avoids exceeding input limits by eliminating the need for multiple images and lengthy text. To further combine the strengths of different ICL methods, we introduce an automatic strategy to select the appropriate ICL method for a data example in a given task. We conducted experiments on MathVista and Hallusionbench to test the effectiveness of I$^2$L in complex multimodal reasoning tasks and mitigating language hallucination and visual illusion. Additionally, we explored the impact of image resolution, the number of demonstration examples, and their positions on the effectiveness of I$^2$L. Our code is publicly available at https://github.com/AGI-Edgerunners/IIL.

* WIP

Via

Access Paper or Ask Questions

LLM-Based Agent Society Investigation: Collaboration and Confrontation in Avalon Gameplay

Oct 23, 2023

Yihuai Lan, Zhiqiang Hu, Lei Wang, Yang Wang, Deheng Ye, Peilin Zhao, Ee-Peng Lim, Hui Xiong, Hao Wang

Figure 1 for LLM-Based Agent Society Investigation: Collaboration and Confrontation in Avalon Gameplay

Figure 2 for LLM-Based Agent Society Investigation: Collaboration and Confrontation in Avalon Gameplay

Figure 3 for LLM-Based Agent Society Investigation: Collaboration and Confrontation in Avalon Gameplay

Figure 4 for LLM-Based Agent Society Investigation: Collaboration and Confrontation in Avalon Gameplay

Abstract:This paper aims to investigate the open research problem of uncovering the social behaviors of LLM-based agents. To achieve this goal, we adopt Avalon, a representative communication game, as the environment and use system prompts to guide LLM agents to play the game. While previous studies have conducted preliminary investigations into gameplay with LLM agents, there lacks research on their social behaviors. In this paper, we present a novel framework designed to seamlessly adapt to Avalon gameplay. The core of our proposed framework is a multi-agent system that enables efficient communication and interaction among agents. We evaluate the performance of our framework based on metrics from two perspectives: winning the game and analyzing the social behaviors of LLM agents. Our results demonstrate the effectiveness of our framework in generating adaptive and intelligent agents and highlight the potential of LLM-based agents in addressing the challenges associated with dynamic social environment interaction. By analyzing the social behaviors of LLM agents from the aspects of both collaboration and confrontation, we provide insights into the research and applications of this domain.

Via

Access Paper or Ask Questions

Who Wrote it and Why? Prompting Large-Language Models for Authorship Verification

Oct 12, 2023

Chia-Yu Hung, Zhiqiang Hu, Yujia Hu, Roy Ka-Wei Lee

Figure 1 for Who Wrote it and Why? Prompting Large-Language Models for Authorship Verification

Figure 2 for Who Wrote it and Why? Prompting Large-Language Models for Authorship Verification

Figure 3 for Who Wrote it and Why? Prompting Large-Language Models for Authorship Verification

Figure 4 for Who Wrote it and Why? Prompting Large-Language Models for Authorship Verification

Abstract:Authorship verification (AV) is a fundamental task in natural language processing (NLP) and computational linguistics, with applications in forensic analysis, plagiarism detection, and identification of deceptive content. Existing AV techniques, including traditional stylometric and deep learning approaches, face limitations in terms of data requirements and lack of explainability. To address these limitations, this paper proposes PromptAV, a novel technique that leverages Large-Language Models (LLMs) for AV by providing step-by-step stylometric explanation prompts. PromptAV outperforms state-of-the-art baselines, operates effectively with limited training data, and enhances interpretability through intuitive explanations, showcasing its potential as an effective and interpretable solution for the AV task.

* 7 pages,1 figure

Via

Access Paper or Ask Questions

Dynamic Spectrum Mixer for Visual Recognition

Sep 15, 2023

Zhiqiang Hu, Tao Yu

Figure 1 for Dynamic Spectrum Mixer for Visual Recognition

Figure 2 for Dynamic Spectrum Mixer for Visual Recognition

Figure 3 for Dynamic Spectrum Mixer for Visual Recognition

Figure 4 for Dynamic Spectrum Mixer for Visual Recognition

Abstract:Recently, MLP-based vision backbones have achieved promising performance in several visual recognition tasks. However, the existing MLP-based methods directly aggregate tokens with static weights, leaving the adaptability to different images untouched. Moreover, Recent research demonstrates that MLP-Transformer is great at creating long-range dependencies but ineffective at catching high frequencies that primarily transmit local information, which prevents it from applying to the downstream dense prediction tasks, such as semantic segmentation. To address these challenges, we propose a content-adaptive yet computationally efficient structure, dubbed Dynamic Spectrum Mixer (DSM). The DSM represents token interactions in the frequency domain by employing the Discrete Cosine Transform, which can learn long-term spatial dependencies with log-linear complexity. Furthermore, a dynamic spectrum weight generation layer is proposed as the spectrum bands selector, which could emphasize the informative frequency bands while diminishing others. To this end, the technique can efficiently learn detailed features from visual input that contains both high- and low-frequency information. Extensive experiments show that DSM is a powerful and adaptable backbone for a range of visual recognition tasks. Particularly, DSM outperforms previous transformer-based and MLP-based models, on image classification, object detection, and semantic segmentation tasks, such as 83.8 \% top-1 accuracy on ImageNet, and 49.9 \% mIoU on ADE20K.

Via

Access Paper or Ask Questions

A Laplacian Pyramid Based Generative H&E Stain Augmentation Network

May 23, 2023

Fangda Li, Zhiqiang Hu, Wen Chen, Avinash Kak

Figure 1 for A Laplacian Pyramid Based Generative H&E Stain Augmentation Network

Figure 2 for A Laplacian Pyramid Based Generative H&E Stain Augmentation Network

Figure 3 for A Laplacian Pyramid Based Generative H&E Stain Augmentation Network

Figure 4 for A Laplacian Pyramid Based Generative H&E Stain Augmentation Network

Abstract:Hematoxylin and Eosin (H&E) staining is a widely used sample preparation procedure for enhancing the saturation of tissue sections and the contrast between nuclei and cytoplasm in histology images for medical diagnostics. However, various factors, such as the differences in the reagents used, result in high variability in the colors of the stains actually recorded. This variability poses a challenge in achieving generalization for machine-learning based computer-aided diagnostic tools. To desensitize the learned models to stain variations, we propose the Generative Stain Augmentation Network (G-SAN) -- a GAN-based framework that augments a collection of cell images with simulated yet realistic stain variations. At its core, G-SAN uses a novel and highly computationally efficient Laplacian Pyramid (LP) based generator architecture, that is capable of disentangling stain from cell morphology. Through the task of patch classification and nucleus segmentation, we show that using G-SAN-augmented training data provides on average 15.7% improvement in F1 score and 7.3% improvement in panoptic quality, respectively. Our code is available at https://github.com/lifangda01/GSAN-Demo.

Via

Access Paper or Ask Questions

Adapter-TST: A Parameter Efficient Method for Multiple-Attribute Text Style Transfer

May 10, 2023

Zhiqiang Hu, Roy Ka-Wei Lee, Nancy F. Chen

Abstract:Adapting a large language model for multiple-attribute text style transfer via fine-tuning can be challenging due to the significant amount of computational resources and labeled data required for the specific task. In this paper, we address this challenge by introducing AdapterTST, a framework that freezes the pre-trained model's original parameters and enables the development of a multiple-attribute text style transfer model. Using BART as the backbone model, Adapter-TST utilizes different neural adapters to capture different attribute information, like a plug-in connected to BART. Our method allows control over multiple attributes, like sentiment, tense, voice, etc., and configures the adapters' architecture to generate multiple outputs respected to attributes or compositional editing on the same sentence. We evaluate the proposed model on both traditional sentiment transfer and multiple-attribute transfer tasks. The experiment results demonstrate that Adapter-TST outperforms all the state-of-the-art baselines with significantly lesser computational resources. We have also empirically shown that each adapter is able to capture specific stylistic attributes effectively and can be configured to perform compositional editing.

* 11 pages, 3 figures

Via

Access Paper or Ask Questions

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

May 06, 2023

Lei Wang, Wanyu Xu, Yihuai Lan, Zhiqiang Hu, Yunshi Lan, Roy Ka-Wei Lee, Ee-Peng Lim

Abstract:Large language models (LLMs) have recently been shown to deliver impressive performance in various NLP tasks. To tackle multi-step reasoning tasks, few-shot chain-of-thought (CoT) prompting includes a few manually crafted step-by-step reasoning demonstrations which enable LLMs to explicitly generate reasoning steps and improve their reasoning task accuracy. To eliminate the manual effort, Zero-shot-CoT concatenates the target problem statement with "Let's think step by step" as an input prompt to LLMs. Despite the success of Zero-shot-CoT, it still suffers from three pitfalls: calculation errors, missing-step errors, and semantic misunderstanding errors. To address the missing-step errors, we propose Plan-and-Solve (PS) Prompting. It consists of two components: first, devising a plan to divide the entire task into smaller subtasks, and then carrying out the subtasks according to the plan. To address the calculation errors and improve the quality of generated reasoning steps, we extend PS prompting with more detailed instructions and derive PS+ prompting. We evaluate our proposed prompting strategy on ten datasets across three reasoning problems. The experimental results over GPT-3 show that our proposed zero-shot prompting consistently outperforms Zero-shot-CoT across all datasets by a large margin, is comparable to or exceeds Zero-shot-Program-of-Thought Prompting, and has comparable performance with 8-shot CoT prompting on the math reasoning problem. The code can be found at https://github.com/AGI-Edgerunners/Plan-and-Solve-Prompting.

* ACL 2023

Via

Access Paper or Ask Questions