Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jie Zhu

Commenting Higher-level Code Unit: Full Code, Reduced Code, or Hierarchical Code Summarization

Mar 13, 2025

Weisong Sun, Yiran Zhang, Jie Zhu, Zhihui Wang, Chunrong Fang, Yonglong Zhang, Yebo Feng, Jiangping Huang, Xingya Wang, Zhi Jin(+1 more)

Figure 1 for Commenting Higher-level Code Unit: Full Code, Reduced Code, or Hierarchical Code Summarization

Figure 2 for Commenting Higher-level Code Unit: Full Code, Reduced Code, or Hierarchical Code Summarization

Figure 3 for Commenting Higher-level Code Unit: Full Code, Reduced Code, or Hierarchical Code Summarization

Figure 4 for Commenting Higher-level Code Unit: Full Code, Reduced Code, or Hierarchical Code Summarization

Abstract:Commenting code is a crucial activity in software development, as it aids in facilitating future maintenance and updates. To enhance the efficiency of writing comments and reduce developers' workload, researchers has proposed various automated code summarization (ACS) techniques to automatically generate comments/summaries for given code units. However, these ACS techniques primarily focus on generating summaries for code units at the method level. There is a significant lack of research on summarizing higher-level code units, such as file-level and module-level code units, despite the fact that summaries of these higher-level code units are highly useful for quickly gaining a macro-level understanding of software components and architecture. To fill this gap, in this paper, we conduct a systematic study on how to use LLMs for commenting higher-level code units, including file level and module level. These higher-level units are significantly larger than method-level ones, which poses challenges in handling long code inputs within LLM constraints and maintaining efficiency. To address these issues, we explore various summarization strategies for ACS of higher-level code units, which can be divided into three types: full code summarization, reduced code summarization, and hierarchical code summarization. The experimental results suggest that for summarizing file-level code units, using the full code is the most effective approach, with reduced code serving as a cost-efficient alternative. However, for summarizing module-level code units, hierarchical code summarization becomes the most promising strategy. In addition, inspired by the research on method-level ACS, we also investigate using the LLM as an evaluator to evaluate the quality of summaries of higher-level code units. The experimental results demonstrate that the LLM's evaluation results strongly correlate with human evaluations.

Via

Access Paper or Ask Questions

Speech Translation Refinement using Large Language Models

Jan 25, 2025

Huaixia Dou, Xinyu Tian, Xinglin Lyu, Jie Zhu, Junhui Li, Lifan Guo

Figure 1 for Speech Translation Refinement using Large Language Models

Figure 2 for Speech Translation Refinement using Large Language Models

Figure 3 for Speech Translation Refinement using Large Language Models

Figure 4 for Speech Translation Refinement using Large Language Models

Abstract:Recent advancements in large language models (LLMs) have demonstrated their remarkable capabilities across various language tasks. Inspired by the success of text-to-text translation refinement, this paper investigates how LLMs can improve the performance of speech translation by introducing a joint refinement process. Through the joint refinement of speech translation (ST) and automatic speech recognition (ASR) transcription via LLMs, the performance of the ST model is significantly improved in both training-free in-context learning and parameter-efficient fine-tuning scenarios. Additionally, we explore the effect of document-level context on refinement under the context-aware fine-tuning scenario. Experimental results on the MuST-C and CoVoST 2 datasets, which include seven translation tasks, demonstrate the effectiveness of the proposed approach using several popular LLMs including GPT-3.5-turbo, LLaMA3-8B, and Mistral-12B. Further analysis further suggests that jointly refining both transcription and translation yields better performance compared to refining translation alone. Meanwhile, incorporating document-level context significantly enhances refinement performance. We release our code and datasets on GitHub.

Via

Access Paper or Ask Questions

Speech Enhancement with Overlapped-Frame Information Fusion and Causal Self-Attention

Jan 21, 2025

Yuewei Zhang, Huanbin Zou, Jie Zhu

Figure 1 for Speech Enhancement with Overlapped-Frame Information Fusion and Causal Self-Attention

Figure 2 for Speech Enhancement with Overlapped-Frame Information Fusion and Causal Self-Attention

Figure 3 for Speech Enhancement with Overlapped-Frame Information Fusion and Causal Self-Attention

Figure 4 for Speech Enhancement with Overlapped-Frame Information Fusion and Causal Self-Attention

Abstract:For time-frequency (TF) domain speech enhancement (SE) methods, the overlap-and-add operation in the inverse TF transformation inevitably leads to an algorithmic delay equal to the window size. However, typical causal SE systems fail to utilize the future speech information within this inherent delay, thereby limiting SE performance. In this paper, we propose an overlapped-frame information fusion scheme. At each frame index, we construct several pseudo overlapped-frames, fuse them with the original speech frame, and then send the fused results to the SE model. Additionally, we introduce a causal time-frequency-channel attention (TFCA) block to boost the representation capability of the neural network. This block parallelly processes the intermediate feature maps through self-attention-based operations in the time, frequency, and channel dimensions. Experiments demonstrate the superiority of these improvements, and the proposed SE system outperforms the current advanced methods.

* Accepted by ICASSP 2025

Via

Access Paper or Ask Questions

MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts

Oct 30, 2024

Jie Zhu, Yixiong Chen, Mingyu Ding, Ping Luo, Leye Wang, Jingdong Wang

Figure 1 for MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts

Figure 2 for MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts

Figure 3 for MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts

Figure 4 for MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts

Abstract:Text-to-image diffusion has attracted vast attention due to its impressive image-generation capabilities. However, when it comes to human-centric text-to-image generation, particularly in the context of faces and hands, the results often fall short of naturalness due to insufficient training priors. We alleviate the issue in this work from two perspectives. 1) From the data aspect, we carefully collect a human-centric dataset comprising over one million high-quality human-in-the-scene images and two specific sets of close-up images of faces and hands. These datasets collectively provide a rich prior knowledge base to enhance the human-centric image generation capabilities of the diffusion model. 2) On the methodological front, we propose a simple yet effective method called Mixture of Low-rank Experts (MoLE) by considering low-rank modules trained on close-up hand and face images respectively as experts. This concept draws inspiration from our observation of low-rank refinement, where a low-rank module trained by a customized close-up dataset has the potential to enhance the corresponding image part when applied at an appropriate scale. To validate the superiority of MoLE in the context of human-centric image generation compared to state-of-the-art, we construct two benchmarks and perform evaluations with diverse metrics and human studies. Datasets, model, and code are released at https://sites.google.com/view/mole4diffuser/.

* Published at NeurIPS 2024

Via

Access Paper or Ask Questions

Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset

May 17, 2024

Jie Zhu, Junhui Li, Yalong Wen, Lifan Guo

Abstract:In light of recent breakthroughs in large language models (LLMs) that have revolutionized natural language processing (NLP), there is an urgent need for new benchmarks to keep pace with the fast development of LLMs. In this paper, we propose CFLUE, the Chinese Financial Language Understanding Evaluation benchmark, designed to assess the capability of LLMs across various dimensions. Specifically, CFLUE provides datasets tailored for both knowledge assessment and application assessment. In knowledge assessment, it consists of 38K+ multiple-choice questions with associated solution explanations. These questions serve dual purposes: answer prediction and question reasoning. In application assessment, CFLUE features 16K+ test instances across distinct groups of NLP tasks such as text classification, machine translation, relation extraction, reading comprehension, and text generation. Upon CFLUE, we conduct a thorough evaluation of representative LLMs. The results reveal that only GPT-4 and GPT-4-turbo achieve an accuracy exceeding 60\% in answer prediction for knowledge assessment, suggesting that there is still substantial room for improvement in current LLMs. In application assessment, although GPT-4 and GPT-4-turbo are the top two performers, their considerable advantage over lightweight LLMs is noticeably diminished. The datasets and scripts associated with CFLUE are openly accessible at https://github.com/aliyun/cflue.

* The 62nd Annual Meeting of the Association for Computational Linguistics(ACL),2024
* Accepted by ACL 2024

Via

Access Paper or Ask Questions

A Unified Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability

Apr 03, 2024

Jie Zhu, Jirong Zha, Ding Li, Leye Wang

Figure 1 for A Unified Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability

Figure 2 for A Unified Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability

Figure 3 for A Unified Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability

Figure 4 for A Unified Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability

Abstract:Self-supervised learning shows promise in harnessing extensive unlabeled data, but it also confronts significant privacy concerns, especially in vision. In this paper, we aim to perform membership inference on visual self-supervised models in a more realistic setting: self-supervised training method and details are unknown for an adversary when attacking as he usually faces a black-box system in practice. In this setting, considering that self-supervised model could be trained by completely different self-supervised paradigms, e.g., masked image modeling and contrastive learning, with complex training details, we propose a unified membership inference method called PartCrop. It is motivated by the shared part-aware capability among models and stronger part response on the training data. Specifically, PartCrop crops parts of objects in an image to query responses with the image in representation space. We conduct extensive attacks on self-supervised models with different training protocols and structures using three widely used image datasets. The results verify the effectiveness and generalization of PartCrop. Moreover, to defend against PartCrop, we evaluate two common approaches, i.e., early stop and differential privacy, and propose a tailored method called shrinking crop scale range. The defense experiments indicate that all of them are effective. Our code is available at https://github.com/JiePKU/PartCrop

* Membership Inference, Self-supervised learning

Via

Access Paper or Ask Questions

A Two-Stage Framework in Cross-Spectrum Domain for Real-Time Speech Enhancement

Jan 19, 2024

Yuewei Zhang, Huanbin Zou, Jie Zhu

Figure 1 for A Two-Stage Framework in Cross-Spectrum Domain for Real-Time Speech Enhancement

Figure 2 for A Two-Stage Framework in Cross-Spectrum Domain for Real-Time Speech Enhancement

Figure 3 for A Two-Stage Framework in Cross-Spectrum Domain for Real-Time Speech Enhancement

Figure 4 for A Two-Stage Framework in Cross-Spectrum Domain for Real-Time Speech Enhancement

Abstract:Two-stage pipeline is popular in speech enhancement tasks due to its superiority over traditional single-stage methods. The current two-stage approaches usually enhance the magnitude spectrum in the first stage, and further modify the complex spectrum to suppress the residual noise and recover the speech phase in the second stage. The above whole process is performed in the short-time Fourier transform (STFT) spectrum domain. In this paper, we re-implement the above second sub-process in the short-time discrete cosine transform (STDCT) spectrum domain. The reason is that we have found STDCT performs greater noise suppression capability than STFT. Additionally, the implicit phase of STDCT ensures simpler and more efficient phase recovery, which is challenging and computationally expensive in the STFT-based methods. Therefore, we propose a novel two-stage framework called the STFT-STDCT spectrum fusion network (FDFNet) for speech enhancement in cross-spectrum domain. Experimental results demonstrate that the proposed FDFNet outperforms the previous two-stage methods and also exhibits superior performance compared to other advanced systems.

* Accepted by ICASSP 2024

Via

Access Paper or Ask Questions

Safety and Performance, Why Not Both? Bi-Objective Optimized Model Compression against Heterogeneous Attacks Toward AI Software Deployment

Jan 02, 2024

Jie Zhu, Leye Wang, Xiao Han, Anmin Liu, Tao Xie

Figure 1 for Safety and Performance, Why Not Both? Bi-Objective Optimized Model Compression against Heterogeneous Attacks Toward AI Software Deployment

Figure 2 for Safety and Performance, Why Not Both? Bi-Objective Optimized Model Compression against Heterogeneous Attacks Toward AI Software Deployment

Figure 3 for Safety and Performance, Why Not Both? Bi-Objective Optimized Model Compression against Heterogeneous Attacks Toward AI Software Deployment

Figure 4 for Safety and Performance, Why Not Both? Bi-Objective Optimized Model Compression against Heterogeneous Attacks Toward AI Software Deployment

Abstract:The size of deep learning models in artificial intelligence (AI) software is increasing rapidly, hindering the large-scale deployment on resource-restricted devices (e.g., smartphones). To mitigate this issue, AI software compression plays a crucial role, which aims to compress model size while keeping high performance. However, the intrinsic defects in a big model may be inherited by the compressed one. Such defects may be easily leveraged by adversaries, since a compressed model is usually deployed in a large number of devices without adequate protection. In this article, we aim to address the safe model compression problem from the perspective of safety-performance co-optimization. Specifically, inspired by the test-driven development (TDD) paradigm in software engineering, we propose a test-driven sparse training framework called SafeCompress. By simulating the attack mechanism as safety testing, SafeCompress can automatically compress a big model to a small one following the dynamic sparse training paradigm. Then, considering two kinds of representative and heterogeneous attack mechanisms, i.e., black-box membership inference attack and white-box membership inference attack, we develop two concrete instances called BMIA-SafeCompress and WMIA-SafeCompress. Further, we implement another instance called MMIA-SafeCompress by extending SafeCompress to defend against the occasion when adversaries conduct black-box and white-box membership inference attacks simultaneously. We conduct extensive experiments on five datasets for both computer vision and natural language processing tasks. The results show the effectiveness and generalizability of our framework. We also discuss how to adapt SafeCompress to other attacks besides membership inference attack, demonstrating the flexibility of SafeCompress.

* Accepted by IEEE Transactions on Software Engineering (TSE). Camera-ready Version. arXiv admin note: substantial text overlap with arXiv:2208.05969

Via

Access Paper or Ask Questions

Magnitude-and-phase-aware Speech Enhancement with Parallel Sequence Modeling

Oct 11, 2023

Yuewei Zhang, Huanbin Zou, Jie Zhu

Abstract:In speech enhancement (SE), phase estimation is important for perceptual quality, so many methods take clean speech's complex short-time Fourier transform (STFT) spectrum or the complex ideal ratio mask (cIRM) as the learning target. To predict these complex targets, the common solution is to design a complex neural network, or use a real network to separately predict the real and imaginary parts of the target. But in this paper, we propose to use a real network to estimate the magnitude mask and normalized cIRM, which not only avoids the significant increase of the model complexity caused by complex networks, but also shows better performance than previous phase estimation methods. Meanwhile, we devise a parallel sequence modeling (PSM) block to improve the RNN block in the convolutional recurrent network (CRN)-based SE model. We name our method as magnitude-and-phase-aware and PSM-based CRN (MPCRN). The experimental results illustrate that our MPCRN has superior SE performance.

* Accepted by ASRU 2023

Via

Access Paper or Ask Questions

VSANet: Real-time Speech Enhancement Based on Voice Activity Detection and Causal Spatial Attention

Oct 11, 2023

Yuewei Zhang, Huanbin Zou, Jie Zhu

Figure 1 for VSANet: Real-time Speech Enhancement Based on Voice Activity Detection and Causal Spatial Attention

Figure 2 for VSANet: Real-time Speech Enhancement Based on Voice Activity Detection and Causal Spatial Attention

Figure 3 for VSANet: Real-time Speech Enhancement Based on Voice Activity Detection and Causal Spatial Attention

Figure 4 for VSANet: Real-time Speech Enhancement Based on Voice Activity Detection and Causal Spatial Attention

Abstract:The deep learning-based speech enhancement (SE) methods always take the clean speech's waveform or time-frequency spectrum feature as the learning target, and train the deep neural network (DNN) by reducing the error loss between the DNN's output and the target. This is a conventional single-task learning paradigm, which has been proven to be effective, but we find that the multi-task learning framework can improve SE performance. Specifically, we design a framework containing a SE module and a voice activity detection (VAD) module, both of which share the same encoder, and the whole network is optimized by the weighted loss of the two modules. Moreover, we design a causal spatial attention (CSA) block to promote the representation capability of DNN. Combining the VAD aided multi-task learning framework and CSA block, our SE network is named VSANet. The experimental results prove the benefits of multi-task learning and the CSA block, which give VSANet an excellent SE performance.

* Accepted by ASRU 2023

Via

Access Paper or Ask Questions