Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jing Jiang

GliDe with a CaPE: A Low-Hassle Method to Accelerate Speculative Decoding

Feb 03, 2024

Cunxiao Du, Jing Jiang, Xu Yuanchen, Jiawei Wu, Sicheng Yu, Yongqi Li, Shenggui Li, Kai Xu, Liqiang Nie, Zhaopeng Tu(+1 more)

Figure 1 for GliDe with a CaPE: A Low-Hassle Method to Accelerate Speculative Decoding

Figure 2 for GliDe with a CaPE: A Low-Hassle Method to Accelerate Speculative Decoding

Figure 3 for GliDe with a CaPE: A Low-Hassle Method to Accelerate Speculative Decoding

Figure 4 for GliDe with a CaPE: A Low-Hassle Method to Accelerate Speculative Decoding

Abstract:Speculative decoding is a relatively new decoding framework that leverages small and efficient draft models to reduce the latency of LLMs. In this study, we introduce GliDe and CaPE, two low-hassle modifications to vanilla speculative decoding to further improve the decoding speed of a frozen LLM. Specifically, GliDe is a modified draft model architecture that reuses the cached keys and values from the target LLM, while CaPE is a proposal expansion method that uses the draft model's confidence scores to help select additional candidate tokens for verification. Extensive experiments on different benchmarks demonstrate that our proposed GliDe draft model significantly reduces the expected decoding latency. Additional evaluation using walltime reveals that GliDe can accelerate Vicuna models up to 2.17x and further extend the improvement to 2.61x with CaPE. We will release our code, data, and the trained draft models.

Via

Access Paper or Ask Questions

Revisiting the Markov Property for Machine Translation

Feb 03, 2024

Cunxiao Du, Hao Zhou, Zhaopeng Tu, Jing Jiang

Figure 1 for Revisiting the Markov Property for Machine Translation

Figure 2 for Revisiting the Markov Property for Machine Translation

Figure 3 for Revisiting the Markov Property for Machine Translation

Figure 4 for Revisiting the Markov Property for Machine Translation

Abstract:In this paper, we re-examine the Markov property in the context of neural machine translation. We design a Markov Autoregressive Transformer~(MAT) and undertake a comprehensive assessment of its performance across four WMT benchmarks. Our findings indicate that MAT with an order larger than 4 can generate translations with quality on par with that of conventional autoregressive transformers. In addition, counter-intuitively, we also find that the advantages of utilizing a higher-order MAT do not specifically contribute to the translation of longer sentences.

* EACL (Findings)

Via

Access Paper or Ask Questions

Enhancing Document-level Translation of Large Language Model via Translation Mixed-instructions

Jan 16, 2024

Yachao Li, Junhui Li, Jing Jiang, Min Zhang

Abstract:Existing large language models (LLMs) for machine translation are typically fine-tuned on sentence-level translation instructions and achieve satisfactory performance at the sentence level. However, when applied to document-level translation, these models face a significant challenge, particularly when dealing with documents containing over 512 tokens. This challenge arises from the issue of sentence-level coverage, where subsequent sentences in the document remain untranslated. As a result, the document-level translation capability of LLMs fine-tuned on sentence-level translation instructions is significantly limited. We conjecture that the primary cause of LLMs' weak document-level translation performance is the absence of document-to-document mapping ability. To address the issue, we propose an approach that combines sentence-level and document-level translation instructions of varying lengths to fine-tune LLMs. Our proposed translation mixed-instructions enable LLMs (Llama-2~7B and 13B) to maintain consistent translation performance from the sentence level to documents containing as many as 2048 tokens. Extensive experimental results show that the proposed approach significantly enhances the document-level translation capabilities of LLMs on 10 language pairs, effectively mitigating the sentence-level coverage issue in document-level translation. Experimentation on discourse phenomena has demonstrated that our document-level translation approach significantly improves translation quality, both in terms of BLEU score and discourse coherence.

* under review

Via

Access Paper or Ask Questions

Foundation Models for Weather and Climate Data Understanding: A Comprehensive Survey

Dec 05, 2023

Shengchao Chen, Guodong Long, Jing Jiang, Dikai Liu, Chengqi Zhang

Abstract:As artificial intelligence (AI) continues to rapidly evolve, the realm of Earth and atmospheric sciences is increasingly adopting data-driven models, powered by progressive developments in deep learning (DL). Specifically, DL techniques are extensively utilized to decode the chaotic and nonlinear aspects of Earth systems, and to address climate challenges via understanding weather and climate data. Cutting-edge performance on specific tasks within narrower spatio-temporal scales has been achieved recently through DL. The rise of large models, specifically large language models (LLMs), has enabled fine-tuning processes that yield remarkable outcomes across various downstream tasks, thereby propelling the advancement of general AI. However, we are still navigating the initial stages of crafting general AI for weather and climate. In this survey, we offer an exhaustive, timely overview of state-of-the-art AI methodologies specifically engineered for weather and climate data, with a special focus on time series and text data. Our primary coverage encompasses four critical aspects: types of weather and climate data, principal model architectures, model scopes and applications, and datasets for weather and climate. Furthermore, in relation to the creation and application of foundation models for weather and climate data understanding, we delve into the field's prevailing challenges, offer crucial insights, and propose detailed avenues for future research. This comprehensive approach equips practitioners with the requisite knowledge to make substantial progress in this domain. Our survey encapsulates the most recent breakthroughs in research on large, data-driven models for weather and climate data understanding, emphasizing robust foundations, current advancements, practical applications, crucial resources, and prospective research opportunities.

* Ongoing work. Survey Paper. 35 pages, 2 figures, 4 tables. The first work to comprehensively and systematically summarize DL-based weather and climate data understanding, paving the way for the development of weather and climate foundation models

Via

Access Paper or Ask Questions

Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld

Nov 28, 2023

Yijun Yang, Tianyi Zhou, Kanxue Li, Dapeng Tao, Lusong Li, Li Shen, Xiaodong He, Jing Jiang, Yuhui Shi

Figure 1 for Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld

Figure 2 for Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld

Figure 3 for Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld

Figure 4 for Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld

Abstract:While large language models (LLMs) excel in a simulated world of texts, they struggle to interact with the more realistic world without perceptions of other modalities such as visual or audio signals. Although vision-language models (VLMs) integrate LLM modules (1) aligned with static image features, and (2) may possess prior knowledge of world dynamics (as demonstrated in the text world), they have not been trained in an embodied visual world and thus cannot align with its dynamics. On the other hand, training an embodied agent in a noisy visual world without expert guidance is often challenging and inefficient. In this paper, we train a VLM agent living in a visual world using an LLM agent excelling in a parallel text world (but inapplicable to the visual world). Specifically, we distill LLM's reflection outcomes (improved actions by analyzing mistakes) in a text world's tasks to finetune the VLM on the same tasks of the visual world, resulting in an Embodied Multi-Modal Agent (EMMA) quickly adapting to the visual world dynamics. Such cross-modality imitation learning between the two parallel worlds enables EMMA to generalize to a broad scope of new tasks without any further guidance from the LLM expert. Extensive evaluations on the ALFWorld benchmark highlight EMMA's superior performance to SOTA VLM-based agents across diverse tasks, e.g., 20%-70% improvement in the success rate.

Via

Access Paper or Ask Questions

Uplift Modeling based on Graph Neural Network Combined with Causal Knowledge

Nov 14, 2023

Haowen Wang, Xinyan Ye, Yangze Zhou, Zhiyi Zhang, Longhan Zhang, Jing Jiang

Figure 1 for Uplift Modeling based on Graph Neural Network Combined with Causal Knowledge

Figure 2 for Uplift Modeling based on Graph Neural Network Combined with Causal Knowledge

Figure 3 for Uplift Modeling based on Graph Neural Network Combined with Causal Knowledge

Figure 4 for Uplift Modeling based on Graph Neural Network Combined with Causal Knowledge

Abstract:Uplift modeling is a fundamental component of marketing effect modeling, which is commonly employed to evaluate the effects of treatments on outcomes. Through uplift modeling, we can identify the treatment with the greatest benefit. On the other side, we can identify clients who are likely to make favorable decisions in response to a certain treatment. In the past, uplift modeling approaches relied heavily on the difference-in-difference (DID) architecture, paired with a machine learning model as the estimation learner, while neglecting the link and confidential information between features. We proposed a framework based on graph neural networks that combine causal knowledge with an estimate of uplift value. Firstly, we presented a causal representation technique based on CATE (conditional average treatment effect) estimation and adjacency matrix structure learning. Secondly, we suggested a more scalable uplift modeling framework based on graph convolution networks for combining causal knowledge. Our findings demonstrate that this method works effectively for predicting uplift values, with small errors in typical simulated data, and its effectiveness has been verified in actual industry marketing data.

* 6 pages, 6 figures

Via

Access Paper or Ask Questions

Intriguing Properties of Data Attribution on Diffusion Models

Nov 01, 2023

Xiaosen Zheng, Tianyu Pang, Chao Du, Jing Jiang, Min Lin

Abstract:Data attribution seeks to trace model outputs back to training data. With the recent development of diffusion models, data attribution has become a desired module to properly assign valuations for high-quality or copyrighted training samples, ensuring that data contributors are fairly compensated or credited. Several theoretically motivated methods have been proposed to implement data attribution, in an effort to improve the trade-off between computational scalability and effectiveness. In this work, we conduct extensive experiments and ablation studies on attributing diffusion models, specifically focusing on DDPMs trained on CIFAR-10 and CelebA, as well as a Stable Diffusion model LoRA-finetuned on ArtBench. Intriguingly, we report counter-intuitive observations that theoretically unjustified design choices for attribution empirically outperform previous baselines by a large margin, in terms of both linear datamodeling score and counterfactual evaluation. Our work presents a significantly more efficient approach for attributing diffusion models, while the unexpected findings suggest that at least in non-convex settings, constructions guided by theoretical assumptions may lead to inferior attribution performance. The code is available at https://github.com/sail-sg/D-TRAK.

Via

Access Paper or Ask Questions

ROME: Evaluating Pre-trained Vision-Language Models on Reasoning beyond Visual Common Sense

Oct 30, 2023

Kankan Zhou, Eason Lai, Wei Bin Au Yeong, Kyriakos Mouratidis, Jing Jiang

Abstract:Humans possess a strong capability for reasoning beyond common sense. For example, given an unconventional image of a goldfish laying on the table next to an empty fishbowl, a human would effortlessly determine that the fish is not inside the fishbowl. The case, however, may be different for a vision-language model, whose reasoning could gravitate towards the common scenario that the fish is inside the bowl, despite the visual input. In this paper, we introduce a novel probing dataset named ROME (reasoning beyond commonsense knowledge) to evaluate whether the state-of-the-art pre-trained vision-language models have the reasoning capability to correctly interpret counter-intuitive content. ROME contains images that defy commonsense knowledge with regards to color, shape, material, size and positional relation. Experiments on the state-of-the-art pre-trained vision-language models reveal that most of these models are still largely incapable of interpreting counter-intuitive scenarios. We hope that ROME will spur further investigations on reasoning beyond commonsense knowledge in vision-language research.

* This is the camera-ready version of the paper that will be published in the EMNLP 2023 Findings (Singapore, 6-10 December 2023)

Via

Access Paper or Ask Questions

Curriculum Reinforcement Learning via Morphology-Environment Co-Evolution

Sep 21, 2023

Shuang Ao, Tianyi Zhou, Guodong Long, Xuan Song, Jing Jiang

Abstract:Throughout long history, natural species have learned to survive by evolving their physical structures adaptive to the environment changes. In contrast, current reinforcement learning (RL) studies mainly focus on training an agent with a fixed morphology (e.g., skeletal structure and joint attributes) in a fixed environment, which can hardly generalize to changing environments or new tasks. In this paper, we optimize an RL agent and its morphology through ``morphology-environment co-evolution (MECE)'', in which the morphology keeps being updated to adapt to the changing environment, while the environment is modified progressively to bring new challenges and stimulate the improvement of the morphology. This leads to a curriculum to train generalizable RL, whose morphology and policy are optimized for different environments. Instead of hand-crafting the curriculum, we train two policies to automatically change the morphology and the environment. To this end, (1) we develop two novel and effective rewards for the two policies, which are solely based on the learning dynamics of the RL agent; (2) we design a scheduler to automatically determine when to change the environment and the morphology. In experiments on two classes of tasks, the morphology and RL policies trained via MECE exhibit significantly better generalization performance in unseen test environments than SOTA morphology optimization methods. Our ablation studies on the two MECE policies further show that the co-evolution between the morphology and environment is the key to the success.

Via

Access Paper or Ask Questions

GPT-Lab: Next Generation Of Optimal Chemistry Discovery By GPT Driven Robotic Lab

Sep 15, 2023

Xiaokai Qin, Mingda Song, Yangguan Chen, Zhehong Ai, Jing Jiang

Abstract:The integration of robots in chemical experiments has enhanced experimental efficiency, but lacking the human intelligence to comprehend literature, they seldom provide assistance in experimental design. Therefore, achieving full-process autonomy from experiment design to validation in self-driven laboratories (SDL) remains a challenge. The introduction of Generative Pre-trained Transformers (GPT), particularly GPT-4, into robotic experimentation offers a solution. We introduce GPT-Lab, a paradigm that employs GPT models to give robots human-like intelligence. With our robotic experimentation platform, GPT-Lab mines literature for materials and methods and validates findings through high-throughput synthesis. As a demonstration, GPT-Lab analyzed 500 articles, identified 18 potential reagents, and successfully produced an accurate humidity colorimetric sensor with a root mean square error (RMSE) of 2.68%. This showcases the rapid materials discovery and validation potential of our system.

Via

Access Paper or Ask Questions