Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tao Yu

Yi: Open Foundation Models by 01.AI

Mar 07, 2024

01. AI, :, Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Heng Li, Jiangcheng Zhu(+22 more)

Figure 1 for Yi: Open Foundation Models by 01.AI

Figure 2 for Yi: Open Foundation Models by 01.AI

Figure 3 for Yi: Open Foundation Models by 01.AI

Figure 4 for Yi: Open Foundation Models by 01.AI

Abstract:We introduce the Yi model family, a series of language and multimodal models that demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models. Our base models achieve strong performance on a wide range of benchmarks like MMLU, and our finetuned chat models deliver strong human preference rate on major evaluation platforms like AlpacaEval and Chatbot Arena. Building upon our scalable super-computing infrastructure and the classical transformer architecture, we attribute the performance of Yi models primarily to its data quality resulting from our data-engineering efforts. For pretraining, we construct 3.1 trillion tokens of English and Chinese corpora using a cascaded data deduplication and quality filtering pipeline. For finetuning, we polish a small scale (less than 10K) instruction dataset over multiple iterations such that every single instance has been verified directly by our machine learning engineers. For vision-language, we combine the chat language model with a vision transformer encoder and train the model to align visual representations to the semantic space of the language model. We further extend the context length to 200K through lightweight continual pretraining and demonstrate strong needle-in-a-haystack retrieval performance. We show that extending the depth of the pretrained checkpoint through continual pretraining further improves performance. We believe that given our current results, continuing to scale up model parameters using thoroughly optimized data will lead to even stronger frontier models.

Via

Access Paper or Ask Questions

Automated Design and Optimization of Distributed Filtering Circuits via Reinforcement Learning

Feb 22, 2024

Peng Gao, Tao Yu, Fei Wang, Ru-Yue Yuan

Figure 1 for Automated Design and Optimization of Distributed Filtering Circuits via Reinforcement Learning

Figure 2 for Automated Design and Optimization of Distributed Filtering Circuits via Reinforcement Learning

Figure 3 for Automated Design and Optimization of Distributed Filtering Circuits via Reinforcement Learning

Figure 4 for Automated Design and Optimization of Distributed Filtering Circuits via Reinforcement Learning

Abstract:Designing distributed filtering circuits (DFCs) is complex and time-consuming, with the circuit performance relying heavily on the expertise and experience of electronics engineers. However, manual design methods tend to have exceedingly low-efficiency. This study proposes a novel end-to-end automated method for fabricating circuits to improve the design of DFCs. The proposed method harnesses reinforcement learning (RL) algorithms, eliminating the dependence on the design experience of engineers. Thus, it significantly reduces the subjectivity and constraints associated with circuit design. The experimental findings demonstrate clear improvements in both design efficiency and quality when comparing the proposed method with traditional engineer-driven methods. In particular, the proposed method achieves superior performance when designing complex or rapidly evolving DFCs. Furthermore, compared to existing circuit automation design techniques, the proposed method demonstrates superior design efficiency, highlighting the substantial potential of RL in circuit design automation.

* 13 pages, 7 figures, 4 tables

Via

Access Paper or Ask Questions

YOLO-TLA: An Efficient and Lightweight Small Object Detection Model based on YOLOv5

Feb 22, 2024

Peng Gao, Chun-Lin Ji, Tao Yu, Ru-Yue Yuan

Abstract:Object detection, a crucial aspect of computer vision, has seen significant advancements in accuracy and robustness. Despite these advancements, practical applications still face notable challenges, primarily the inaccurate detection or missed detection of small objects. In this paper, we propose YOLO-TLA, an advanced object detection model building on YOLOv5. We first introduce an additional detection layer for small objects in the neck network pyramid architecture, thereby producing a feature map of a larger scale to discern finer features of small objects. Further, we integrate the C3CrossCovn module into the backbone network. This module uses sliding window feature extraction, which effectively minimizes both computational demand and the number of parameters, rendering the model more compact. Additionally, we have incorporated a global attention mechanism into the backbone network. This mechanism combines the channel information with global information to create a weighted feature map. This feature map is tailored to highlight the attributes of the object of interest, while effectively ignoring irrelevant details. In comparison to the baseline YOLOv5s model, our newly developed YOLO-TLA model has shown considerable improvements on the MS COCO validation dataset, with increases of 4.6% in mAP@0.5 and 4% in mAP@0.5:0.95, all while keeping the model size compact at 9.49M parameters. Further extending these improvements to the YOLOv5m model, the enhanced version exhibited a 1.7% and 1.9% increase in mAP@0.5 and mAP@0.5:0.95, respectively, with a total of 27.53M parameters. These results validate the YOLO-TLA model's efficient and effective performance in small object detection, achieving high accuracy with fewer parameters and computational demands.

* 11 pages, 11 figures, 7 tables

Via

Access Paper or Ask Questions

ARKS: Active Retrieval in Knowledge Soup for Code Generation

Feb 19, 2024

Hongjin Su, Shuyang Jiang, Yuhang Lai, Haoyuan Wu, Boao Shi, Che Liu, Qian Liu, Tao Yu

Abstract:Recently the retrieval-augmented generation (RAG) paradigm has raised much attention for its potential in incorporating external knowledge into large language models (LLMs) without further training. While widely explored in natural language applications, its utilization in code generation remains under-explored. In this paper, we introduce Active Retrieval in Knowledge Soup (ARKS), an advanced strategy for generalizing large language models for code. In contrast to relying on a single source, we construct a knowledge soup integrating web search, documentation, execution feedback, and evolved code snippets. We employ an active retrieval strategy that iteratively refines the query and updates the knowledge soup. To assess the performance of ARKS, we compile a new benchmark comprising realistic coding problems associated with frequently updated libraries and long-tail programming languages. Experimental results on ChatGPT and CodeLlama demonstrate a substantial improvement in the average execution accuracy of ARKS on LLMs. The analysis confirms the effectiveness of our proposed knowledge soup and active retrieval strategies, offering rich insights into the construction of effective retrieval-augmented code generation (RACG) pipelines. Our model, code, and data are available at https://arks-codegen.github.io.

* Retrieval-augmented code generation

Via

Access Paper or Ask Questions

Generative Representational Instruction Tuning

Feb 15, 2024

Niklas Muennighoff, Hongjin Su, Liang Wang, Nan Yang, Furu Wei, Tao Yu, Amanpreet Singh, Douwe Kiela

Figure 1 for Generative Representational Instruction Tuning

Figure 2 for Generative Representational Instruction Tuning

Figure 3 for Generative Representational Instruction Tuning

Figure 4 for Generative Representational Instruction Tuning

Abstract:All text-based language problems can be reduced to either generation or embedding. Current models only perform well at one or the other. We introduce generative representational instruction tuning (GRIT) whereby a large language model is trained to handle both generative and embedding tasks by distinguishing between them through instructions. Compared to other open models, our resulting GritLM 7B sets a new state of the art on the Massive Text Embedding Benchmark (MTEB) and outperforms all models up to its size on a range of generative tasks. By scaling up further, GritLM 8x7B outperforms all open generative language models that we tried while still being among the best embedding models. Notably, we find that GRIT matches training on only generative or embedding data, thus we can unify both at no performance loss. Among other benefits, the unification via GRIT speeds up Retrieval-Augmented Generation (RAG) by > 60% for long documents, by no longer requiring separate retrieval and generation models. Models, code, etc. are freely available at https://github.com/ContextualAI/gritlm.

* 65 pages (15 main), 25 figures, 33 tables

Via

Access Paper or Ask Questions

OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

Feb 15, 2024

Zhiyong Wu, Chengcheng Han, Zichen Ding, Zhenmin Weng, Zhoumianze Liu, Shunyu Yao, Tao Yu, Lingpeng Kong

Figure 1 for OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

Figure 2 for OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

Figure 3 for OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

Figure 4 for OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

Abstract:Autonomous interaction with the computer has been a longstanding challenge with great potential, and the recent proliferation of large language models (LLMs) has markedly accelerated progress in building digital agents. However, most of these agents are designed to interact with a narrow domain, such as a specific software or website. This narrow focus constrains their applicability for general computer tasks. To this end, we introduce OS-Copilot, a framework to build generalist agents capable of interfacing with comprehensive elements in an operating system (OS), including the web, code terminals, files, multimedia, and various third-party applications. We use OS-Copilot to create FRIDAY, a self-improving embodied agent for automating general computer tasks. On GAIA, a general AI assistants benchmark, FRIDAY outperforms previous methods by 35%, showcasing strong generalization to unseen applications via accumulated skills from previous tasks. We also present numerical and quantitative evidence that FRIDAY learns to control and self-improve on Excel and Powerpoint with minimal supervision. Our OS-Copilot framework and empirical findings provide infrastructure and insights for future research toward more capable and general-purpose computer agents.

* Project page: https://os-copilot.github.io

Via

Access Paper or Ask Questions

Momentum Approximation in Asynchronous Private Federated Learning

Feb 14, 2024

Tao Yu, Congzheng Song, Jianyu Wang, Mona Chitnis

Figure 1 for Momentum Approximation in Asynchronous Private Federated Learning

Figure 2 for Momentum Approximation in Asynchronous Private Federated Learning

Figure 3 for Momentum Approximation in Asynchronous Private Federated Learning

Figure 4 for Momentum Approximation in Asynchronous Private Federated Learning

Abstract:Asynchronous protocols have been shown to improve the scalability of federated learning (FL) with a massive number of clients. Meanwhile, momentum-based methods can achieve the best model quality in synchronous FL. However, naively applying momentum in asynchronous FL algorithms leads to slower convergence and degraded model performance. It is still unclear how to effective combinie these two techniques together to achieve a win-win. In this paper, we find that asynchrony introduces implicit bias to momentum updates. In order to address this problem, we propose momentum approximation that minimizes the bias by finding an optimal weighted average of all historical model updates. Momentum approximation is compatible with secure aggregation as well as differential privacy, and can be easily integrated in production FL systems with a minor communication and storage cost. We empirically demonstrate that on benchmark FL datasets, momentum approximation can achieve $1.15 \textrm{--}4\times$ speed up in convergence compared to existing asynchronous FL optimizers with momentum.

Via

Access Paper or Ask Questions

Stochastic Dynamic Power Dispatch with High Generalization and Few-Shot Adaption via Contextual Meta Graph Reinforcement Learning

Jan 19, 2024

Bairong Deng, Tao Yu, Zhenning Pan, Xuehan Zhang, Yufeng Wu, Qiaoyi Ding

Figure 1 for Stochastic Dynamic Power Dispatch with High Generalization and Few-Shot Adaption via Contextual Meta Graph Reinforcement Learning

Figure 2 for Stochastic Dynamic Power Dispatch with High Generalization and Few-Shot Adaption via Contextual Meta Graph Reinforcement Learning

Figure 3 for Stochastic Dynamic Power Dispatch with High Generalization and Few-Shot Adaption via Contextual Meta Graph Reinforcement Learning

Figure 4 for Stochastic Dynamic Power Dispatch with High Generalization and Few-Shot Adaption via Contextual Meta Graph Reinforcement Learning

Abstract:Reinforcement learning is an emerging approaches to facilitate multi-stage sequential decision-making problems. This paper studies a real-time multi-stage stochastic power dispatch considering multivariate uncertainties. Current researches suffer from low generalization and practicality, that is, the learned dispatch policy can only handle a specific dispatch scenario, its performance degrades significantly if actual samples and training samples are inconsistent. To fill these gaps, a novel contextual meta graph reinforcement learning (Meta-GRL) for a highly generalized multi-stage optimal dispatch policy is proposed. Specifically, a more general contextual Markov decision process (MDP) and scalable graph representation are introduced to achieve a more generalized multi-stage stochastic power dispatch modeling. An upper meta-learner is proposed to encode context for different dispatch scenarios and learn how to achieve dispatch task identification while the lower policy learner learns context-specified dispatch policy. After sufficient offline learning, this approach can rapidly adapt to unseen and undefined scenarios with only a few updations of the hypothesis judgments generated by the meta-learner. Numerical comparisons with state-of-the-art policies and traditional reinforcement learning verify the optimality, efficiency, adaptability, and scalability of the proposed Meta-GRL.

Via

Access Paper or Ask Questions

Fluctuation-based Adaptive Structured Pruning for Large Language Models

Dec 19, 2023

Yongqi An, Xu Zhao, Tao Yu, Ming Tang, Jinqiao Wang

Abstract:Network Pruning is a promising way to address the huge computing resource demands of the deployment and inference of Large Language Models (LLMs). Retraining-free is important for LLMs' pruning methods. However, almost all of the existing retraining-free pruning approaches for LLMs focus on unstructured pruning, which requires specific hardware support for acceleration. In this paper, we propose a novel retraining-free structured pruning framework for LLMs, named FLAP (FLuctuation-based Adaptive Structured Pruning). It is hardware-friendly by effectively reducing storage and enhancing inference speed. For effective structured pruning of LLMs, we highlight three critical elements that demand the utmost attention: formulating structured importance metrics, adaptively searching the global compressed model, and implementing compensation mechanisms to mitigate performance loss. First, FLAP determines whether the output feature map is easily recoverable when a column of weight is removed, based on the fluctuation pruning metric. Then it standardizes the importance scores to adaptively determine the global compressed model structure. At last, FLAP adds additional bias terms to recover the output feature maps using the baseline values. We thoroughly evaluate our approach on a variety of language benchmarks. Without any retraining, our method significantly outperforms the state-of-the-art methods, including LLM-Pruner and the extension of Wanda in structured pruning. The code is released at https://github.com/CASIA-IVA-Lab/FLAP.

* Accepted to AAAI 2024

Via

Access Paper or Ask Questions

Internet of Federated Digital Twins (IoFDT): Connecting Twins Beyond Borders for Society 5.0

Dec 11, 2023

Tao Yu, Zongdian Li, Kei Sakaguchi, Omar Hashash, Walid Saad, Merouane Debbah

Figure 1 for Internet of Federated Digital Twins (IoFDT): Connecting Twins Beyond Borders for Society 5.0

Figure 2 for Internet of Federated Digital Twins (IoFDT): Connecting Twins Beyond Borders for Society 5.0

Figure 3 for Internet of Federated Digital Twins (IoFDT): Connecting Twins Beyond Borders for Society 5.0

Figure 4 for Internet of Federated Digital Twins (IoFDT): Connecting Twins Beyond Borders for Society 5.0

Abstract:The concept of digital twin (DT), which enables the creation of a programmable, digital representation of physical systems, is expected to revolutionize future industries and will lie at the heart of the vision of a future smart society, namely, Society 5.0, in which high integration between cyber (digital) and physical spaces is exploited to bring economic and societal advancements. However, the success of such a DT-driven Society 5.0 requires a synergistic convergence of artificial intelligence and networking technologies into an integrated, programmable system that can coordinate networks of DTs to effectively deliver diverse Society 5.0 services. Prior works remain restricted to either qualitative study, simple analysis or software implementations of a single DT, and thus, they cannot provide the highly synergistic integration of digital and physical spaces as required by Society 5.0. In contrast, this paper envisions a novel concept of an Internet of Federated Digital Twins (IoFDT) that holistically integrates heterogeneous and physically separated DTs representing different Society 5.0 services within a single framework and system. For this concept of IoFDT, we first introduce a hierarchical architecture that integrates federated DTs through horizontal and vertical interactions, bridging the cyber and physical spaces to unlock new possibilities. Then, we discuss the challenges of realizing IoFDT, highlighting the intricacies across communication, computing, and AI-native networks while also underscoring potential innovative solutions. Subsequently, we elaborate on the importance of the implementation of a unified IoFDT platform that integrates all technical components and orchestrates their interactions, emphasizing the necessity of practical experimental platforms with a focus on real-world applications in areas like smart mobility.

Via

Access Paper or Ask Questions