Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Li Chen

College of Computer and Artificial Intelligence, Zhengzhou University, Institute of Physical Education

Mixed-Precision Quantization: Make the Best Use of Bits Where They Matter Most

Dec 05, 2024

Yiming Fang, Li Chen, Yunfei Chen, Weidong Wang, Changsheng You

Figure 1 for Mixed-Precision Quantization: Make the Best Use of Bits Where They Matter Most

Figure 2 for Mixed-Precision Quantization: Make the Best Use of Bits Where They Matter Most

Figure 3 for Mixed-Precision Quantization: Make the Best Use of Bits Where They Matter Most

Figure 4 for Mixed-Precision Quantization: Make the Best Use of Bits Where They Matter Most

Abstract:Mixed-precision quantization offers superior performance to fixed-precision quantization. It has been widely used in signal processing, communication systems, and machine learning. In mixed-precision quantization, bit allocation is essential. Hence, in this paper, we propose a new bit allocation framework for mixed-precision quantization from a search perspective. First, we formulate a general bit allocation problem for mixed-precision quantization. Then we introduce the penalized particle swarm optimization (PPSO) algorithm to address the integer consumption constraint. To improve efficiency and avoid iterations on infeasible solutions within the PPSO algorithm, a greedy criterion particle swarm optimization (GC-PSO) algorithm is proposed. The corresponding convergence analysis is derived based on dynamical system theory. Furthermore, we apply the above framework to some specific classic fields, i.e., finite impulse response (FIR) filters, receivers, and gradient descent. Numerical examples in each application underscore the superiority of the proposed framework to the existing algorithms.

* 15 pages, 10 figures

Via

Access Paper or Ask Questions

FullStack Bench: Evaluating LLMs as Full Stack Coders

Dec 03, 2024

Siyao Liu, He Zhu, Jerry Liu, Shulin Xin, Aoyan Li, Rui Long, Li Chen, Jack Yang, Jinxiang Xia, Z. Y. Peng(+7 more)

Figure 1 for FullStack Bench: Evaluating LLMs as Full Stack Coders

Figure 2 for FullStack Bench: Evaluating LLMs as Full Stack Coders

Figure 3 for FullStack Bench: Evaluating LLMs as Full Stack Coders

Figure 4 for FullStack Bench: Evaluating LLMs as Full Stack Coders

Abstract:As the capabilities of code large language models (LLMs) continue to expand, their applications across diverse code intelligence domains are rapidly increasing. However, most existing datasets only evaluate limited application domains. To address this gap, we have developed a comprehensive code evaluation dataset FullStack Bench focusing on full-stack programming, which encompasses a wide range of application domains (e.g., basic programming, data analysis, software engineering, mathematics, and machine learning). Besides, to assess multilingual programming capabilities, in FullStack Bench, we design real-world instructions and corresponding unit test cases from 16 widely-used programming languages to reflect real-world usage scenarios rather than simple translations. Moreover, we also release an effective code sandbox execution tool (i.e., SandboxFusion) supporting various programming languages and packages to evaluate the performance of our FullStack Bench efficiently. Comprehensive experimental results on our FullStack Bench demonstrate the necessity and effectiveness of our FullStack Bench and SandboxFusion.

* 26 pages

Via

Access Paper or Ask Questions

Integrating Secondary Structures Information into Triangular Spatial Relationships (TSR) for Advanced Protein Classification

Nov 19, 2024

Poorya Khajouie, Titli Sarkar, Krishna Rauniyar, Li Chen, Wu Xu, Vijay Raghavan

Figure 1 for Integrating Secondary Structures Information into Triangular Spatial Relationships (TSR) for Advanced Protein Classification

Figure 2 for Integrating Secondary Structures Information into Triangular Spatial Relationships (TSR) for Advanced Protein Classification

Figure 3 for Integrating Secondary Structures Information into Triangular Spatial Relationships (TSR) for Advanced Protein Classification

Figure 4 for Integrating Secondary Structures Information into Triangular Spatial Relationships (TSR) for Advanced Protein Classification

Abstract:Protein structures represent the key to deciphering biological functions. The more detailed form of similarity among these proteins is sometimes overlooked by the conventional structural comparison methods. In contrast, further advanced methods, such as Triangular Spatial Relationship (TSR), have been demonstrated to make finer differentiations. Still, the classical implementation of TSR does not provide for the integration of secondary structure information, which is important for a more detailed understanding of the folding pattern of a protein. To overcome these limitations, we developed the SSE-TSR approach. The proposed method integrates secondary structure elements (SSEs) into TSR-based protein representations. This allows an enriched representation of protein structures by considering 18 different combinations of helix, strand, and coil arrangements. Our results show that using SSEs improves the accuracy and reliability of protein classification to varying degrees. We worked with two large protein datasets of 9.2K and 7.8K samples, respectively. We applied the SSE-TSR approach and used a neural network model for classification. Interestingly, introducing SSEs improved performance statistics for Dataset 1, with accuracy moving from 96.0% to 98.3%. For Dataset 2, where the performance statistics were already good, further small improvements were found with the introduction of SSE, giving an accuracy of 99.5% compared to 99.4%. These results show that SSE integration can dramatically improve TSR key discrimination, with significant benefits in datasets with low initial accuracies and only incremental gains in those with high baseline performance. Thus, SSE-TSR is a powerful bioinformatics tool that improves protein classification and understanding of protein function and interaction.

Via

Access Paper or Ask Questions

QCS:Feature Refining from Quadruplet Cross Similarity for Facial Expression Recognition

Nov 04, 2024

Chengpeng Wang, Li Chen, Lili Wang, Zhaofan Li, Xuebin Lv

Figure 1 for QCS:Feature Refining from Quadruplet Cross Similarity for Facial Expression Recognition

Figure 2 for QCS:Feature Refining from Quadruplet Cross Similarity for Facial Expression Recognition

Figure 3 for QCS:Feature Refining from Quadruplet Cross Similarity for Facial Expression Recognition

Figure 4 for QCS:Feature Refining from Quadruplet Cross Similarity for Facial Expression Recognition

Abstract:On facial expression datasets with complex and numerous feature types, where the significance and dominance of labeled features are difficult to predict, facial expression recognition(FER) encounters the challenges of inter-class similarity and intra-class variances, making it difficult to mine effective features. We aim to solely leverage the feature similarity among facial samples to address this. We introduce the Cross Similarity Attention (CSA), an input-output position-sensitive attention mechanism that harnesses feature similarity across different images to compute the corresponding global spatial attention. Based on this, we propose a four-branch circular framework, called Quadruplet Cross Similarity (QCS), to extract discriminative features from the same class and eliminate redundant ones from different classes synchronously to refine cleaner features. The symmetry of the network ensures balanced and stable training and reduces the amount of CSA interaction matrix. Contrastive residual distillation is utilized to transfer the information learned in the cross module back to the base network. The cross-attention module exists during training, and only one base branch is retained during inference. our proposed QCS model outperforms state-of-the-art methods on several popular FER datasets, without requiring additional landmark information or other extra training data. The code is available at https://github.com/birdwcp/QCS.

Via

Access Paper or Ask Questions

Class-RAG: Content Moderation with Retrieval Augmented Generation

Oct 18, 2024

Jianfa Chen, Emily Shen, Trupti Bavalatti, Xiaowen Lin, Yongkai Wang, Shuming Hu, Harihar Subramanyam, Ksheeraj Sai Vepuri, Ming Jiang, Ji Qi(+3 more)

Figure 1 for Class-RAG: Content Moderation with Retrieval Augmented Generation

Figure 2 for Class-RAG: Content Moderation with Retrieval Augmented Generation

Figure 3 for Class-RAG: Content Moderation with Retrieval Augmented Generation

Figure 4 for Class-RAG: Content Moderation with Retrieval Augmented Generation

Abstract:Robust content moderation classifiers are essential for the safety of Generative AI systems. Content moderation, or safety classification, is notoriously ambiguous: differences between safe and unsafe inputs are often extremely subtle, making it difficult for classifiers (and indeed, even humans) to properly distinguish violating vs. benign samples without further context or explanation. Furthermore, as these technologies are deployed across various applications and audiences, scaling risk discovery and mitigation through continuous model fine-tuning becomes increasingly challenging and costly. To address these challenges, we propose a Classification approach employing Retrieval-Augmented Generation (Class-RAG). Class-RAG extends the capability of its base LLM through access to a retrieval library which can be dynamically updated to enable semantic hotfixing for immediate, flexible risk mitigation. Compared to traditional fine-tuned models, Class-RAG demonstrates flexibility and transparency in decision-making. As evidenced by empirical studies, Class-RAG outperforms on classification and is more robust against adversarial attack. Besides, our findings suggest that Class-RAG performance scales with retrieval library size, indicating that increasing the library size is a viable and low-cost approach to improve content moderation.

* 11 pages, submit to ACL

Via

Access Paper or Ask Questions

Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation

Oct 10, 2024

Qingwen Bu, Hongyang Li, Li Chen, Jisong Cai, Jia Zeng, Heming Cui, Maoqing Yao, Yu Qiao

Figure 1 for Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation

Figure 2 for Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation

Figure 3 for Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation

Figure 4 for Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation

Abstract:The increasing demand for versatile robotic systems to operate in diverse and dynamic environments has emphasized the importance of a generalist policy, which leverages a large cross-embodiment data corpus to facilitate broad adaptability and high-level reasoning. However, the generalist would struggle with inefficient inference and cost-expensive training. The specialist policy, instead, is curated for specific domain data and excels at task-level precision with efficiency. Yet, it lacks the generalization capacity for a wide range of applications. Inspired by these observations, we introduce RoboDual, a synergistic dual-system that supplements the merits of both generalist and specialist policy. A diffusion transformer-based specialist is devised for multi-step action rollouts, exquisitely conditioned on the high-level task understanding and discretized action output of a vision-language-action (VLA) based generalist. Compared to OpenVLA, RoboDual achieves 26.7% improvement in real-world setting and 12% gain on CALVIN by introducing a specialist policy with merely 20M trainable parameters. It maintains strong performance with 5% of demonstration data only, and enables a 3.8 times higher control frequency in real-world deployment. Code would be made publicly available. Our project page is hosted at: https://opendrivelab.com/RoboDual/

* Project page: https://opendrivelab.com/RoboDual/

Via

Access Paper or Ask Questions

Integrating Planning into Single-Turn Long-Form Text Generation

Oct 08, 2024

Yi Liang, You Wu, Honglei Zhuang, Li Chen, Jiaming Shen, Yiling Jia, Zhen Qin, Sumit Sanghai, Xuanhui Wang, Carl Yang(+1 more)

Figure 1 for Integrating Planning into Single-Turn Long-Form Text Generation

Figure 2 for Integrating Planning into Single-Turn Long-Form Text Generation

Figure 3 for Integrating Planning into Single-Turn Long-Form Text Generation

Figure 4 for Integrating Planning into Single-Turn Long-Form Text Generation

Abstract:Generating high-quality, in-depth textual documents, such as academic papers, news articles, Wikipedia entries, and books, remains a significant challenge for Large Language Models (LLMs). In this paper, we propose to use planning to generate long form content. To achieve our goal, we generate intermediate steps via an auxiliary task that teaches the LLM to plan, reason and structure before generating the final text. Our main novelty lies in a single auxiliary task that does not require multiple rounds of prompting or planning. To overcome the scarcity of training data for these intermediate steps, we leverage LLMs to generate synthetic intermediate writing data such as outlines, key information and summaries from existing full articles. Our experiments demonstrate on two datasets from different domains, namely the scientific news dataset SciNews and Wikipedia datasets in KILT-Wiki and FreshWiki, that LLMs fine-tuned with the auxiliary task generate higher quality documents. We observed +2.5% improvement in ROUGE-Lsum, and a strong 3.60 overall win/loss ratio via human SxS evaluation, with clear wins in organization, relevance, and verifiability.

Via

Access Paper or Ask Questions

Deep Transfer Learning-based Detection for Flash Memory Channels

Oct 08, 2024

Zhen Mei, Kui Cai, Long Shi, Jun Li, Li Chen, Kees A. Schouhamer Immink

Figure 1 for Deep Transfer Learning-based Detection for Flash Memory Channels

Figure 2 for Deep Transfer Learning-based Detection for Flash Memory Channels

Figure 3 for Deep Transfer Learning-based Detection for Flash Memory Channels

Figure 4 for Deep Transfer Learning-based Detection for Flash Memory Channels

Abstract:The NAND flash memory channel is corrupted by different types of noises, such as the data retention noise and the wear-out noise, which lead to unknown channel offset and make the flash memory channel non-stationary. In the literature, machine learning-based methods have been proposed for data detection for flash memory channels. However, these methods require a large number of training samples and labels to achieve a satisfactory performance, which is costly. Furthermore, with a large unknown channel offset, it may be impossible to obtain enough correct labels. In this paper, we reformulate the data detection for the flash memory channel as a transfer learning (TL) problem. We then propose a model-based deep TL (DTL) algorithm for flash memory channel detection. It can effectively reduce the training data size from $10^6$ samples to less than 104 samples. Moreover, we propose an unsupervised domain adaptation (UDA)-based DTL algorithm using moment alignment, which can detect data without any labels. Hence, it is suitable for scenarios where the decoding of error-correcting code fails and no labels can be obtained. Finally, a UDA-based threshold detector is proposed to eliminate the need for a neural network. Both the channel raw error rate analysis and simulation results demonstrate that the proposed DTL-based detection schemes can achieve near-optimal bit error rate (BER) performance with much less training data and/or without using any labels.

* This paper has been accepted for publication in IEEE Transactions on Communications

Via

Access Paper or Ask Questions

Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild

Oct 07, 2024

Xinyu Zhao, Guoheng Sun, Ruisi Cai, Yukun Zhou, Pingzhi Li, Peihao Wang, Bowen Tan, Yexiao He, Li Chen, Yi Liang(+6 more)

Figure 1 for Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild

Figure 2 for Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild

Figure 3 for Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild

Figure 4 for Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild

Abstract:As Large Language Models (LLMs) excel across tasks and specialized domains, scaling LLMs based on existing models has garnered significant attention, which faces the challenge of decreasing performance when combining disparate models. Various techniques have been proposed for the aggregation of pre-trained LLMs, including model merging, Mixture-of-Experts, and stacking. Despite their merits, a comprehensive comparison and synergistic application of them to a diverse model zoo is yet to be adequately addressed. In light of this research gap, this paper introduces Model-GLUE, a holistic LLM scaling guideline. First, our work starts with a benchmarking of existing LLM scaling techniques, especially selective merging, and variants of mixture. Utilizing the insights from the benchmark results, we formulate an strategy for the selection and aggregation of a heterogeneous model zoo characterizing different architectures and initialization. Our methodology involves the clustering of mergeable models and optimal merging strategy selection, and the integration of clusters through a model mixture. Finally, evidenced by our experiments on a diverse Llama-2-based model zoo, Model-GLUE shows an average performance enhancement of 5.61%, achieved without additional training. Codes are available at: https://github.com/Model-GLUE/Model-GLUE.

* 24 pages, 4 figures, accepted to NeurIPS 2024 Datasets and Benchmarks Track

Via

Access Paper or Ask Questions

Reasoning Multi-Agent Behavioral Topology for Interactive Autonomous Driving

Sep 26, 2024

Haochen Liu, Li Chen, Yu Qiao, Chen Lv, Hongyang Li

Abstract:Autonomous driving system aims for safe and social-consistent driving through the behavioral integration among interactive agents. However, challenges remain due to multi-agent scene uncertainty and heterogeneous interaction. Current dense and sparse behavioral representations struggle with inefficiency and inconsistency in multi-agent modeling, leading to instability of collective behavioral patterns when integrating prediction and planning (IPP). To address this, we initiate a topological formation that serves as a compliant behavioral foreground to guide downstream trajectory generations. Specifically, we introduce Behavioral Topology (BeTop), a pivotal topological formulation that explicitly represents the consensual behavioral pattern among multi-agent future. BeTop is derived from braid theory to distill compliant interactive topology from multi-agent future trajectories. A synergistic learning framework (BeTopNet) supervised by BeTop facilitates the consistency of behavior prediction and planning within the predicted topology priors. Through imitative contingency learning, BeTop also effectively manages behavioral uncertainty for prediction and planning. Extensive verification on large-scale real-world datasets, including nuPlan and WOMD, demonstrates that BeTop achieves state-of-the-art performance in both prediction and planning tasks. Further validations on the proposed interactive scenario benchmark showcase planning compliance in interactive cases.

Via

Access Paper or Ask Questions