Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wei Zhang

Alibaba Group

Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models

Oct 31, 2024

Jianqun Zhou, Yuanlei Zheng, Wei Chen, Qianqian Zheng, Zeyuan Shang, Wei Zhang, Rui Meng, Xiaoyu Shen

Figure 1 for Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models

Figure 2 for Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models

Figure 3 for Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models

Figure 4 for Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models

Abstract:Instruction-following capabilities in large language models (LLMs) have significantly progressed, enabling more complex user interactions through detailed prompts. However, retrieval systems have not matched these advances, most of them still relies on traditional lexical and semantic matching techniques that fail to fully capture user intent. Recent efforts have introduced instruction-aware retrieval models, but these primarily focus on intrinsic content relevance, which neglects the importance of customized preferences for broader document-level attributes. This study evaluates the instruction-following capabilities of various retrieval models beyond content relevance, including LLM-based dense retrieval and reranking models. We develop InfoSearch, a novel retrieval evaluation benchmark spanning six document-level attributes: Audience, Keyword, Format, Language, Length, and Source, and introduce novel metrics -- Strict Instruction Compliance Ratio (SICR) and Weighted Instruction Sensitivity Evaluation (WISE) to accurately assess the models' responsiveness to instructions. Our findings reveal that while reranking models generally surpass retrieval models in instruction following, they still face challenges in handling certain attributes. Moreover, although instruction fine-tuning and increased model size lead to better performance, most models fall short of achieving comprehensive instruction compliance as assessed by our benchmark.

Via

Access Paper or Ask Questions

Integration of Communication and Computational Imaging

Oct 29, 2024

Zhenming Yu, Liming Cheng, Hongyu Huang, Wei Zhang, Liang Lin, Kun Xu

Figure 1 for Integration of Communication and Computational Imaging

Figure 2 for Integration of Communication and Computational Imaging

Figure 3 for Integration of Communication and Computational Imaging

Figure 4 for Integration of Communication and Computational Imaging

Abstract:Communication enables the expansion of human visual perception beyond the limitations of time and distance, while computational imaging overcomes the constraints of depth and breadth. Although impressive achievements have been witnessed with the two types of technologies, the occlusive information flow between the two domains is a bottleneck hindering their ulterior progression. Herein, we propose a novel framework that integrates communication and computational imaging (ICCI) to break through the inherent isolation between communication and computational imaging for remote perception. By jointly considering the sensing and transmitting of remote visual information, the ICCI framework performs a full-link information transfer optimization, aiming to minimize information loss from the generation of the information source to the execution of the final vision tasks. We conduct numerical analysis and experiments to demonstrate the ICCI framework by integrating communication systems and snapshot compressive imaging systems. Compared with straightforward combination schemes, which sequentially execute sensing and transmitting, the ICCI scheme shows greater robustness against channel noise and impairments while achieving higher data compression. Moreover, an 80 km 27-band hyperspectral video perception with a rate of 30 fps is experimentally achieved. This new ICCI remote perception paradigm offers a highefficiency solution for various real-time computer vision tasks.

Via

Access Paper or Ask Questions

CLAP. I. Resolving miscalibration for deep learning-based galaxy photometric redshift estimation

Oct 25, 2024

Qiufan Lin, Hengxin Ruan, Dominique Fouchez, Shupei Chen, Rui Li, Paulo Montero-Camacho, Nicola R. Napolitano, Yuan-Sen Ting, Wei Zhang

Figure 1 for CLAP. I. Resolving miscalibration for deep learning-based galaxy photometric redshift estimation

Figure 2 for CLAP. I. Resolving miscalibration for deep learning-based galaxy photometric redshift estimation

Figure 3 for CLAP. I. Resolving miscalibration for deep learning-based galaxy photometric redshift estimation

Figure 4 for CLAP. I. Resolving miscalibration for deep learning-based galaxy photometric redshift estimation

Abstract:Obtaining well-calibrated photometric redshift probability densities for galaxies without a spectroscopic measurement remains a challenge. Deep learning discriminative models, typically fed with multi-band galaxy images, can produce outputs that mimic probability densities and achieve state-of-the-art accuracy. However, such models may be affected by miscalibration that would result in discrepancies between the model outputs and the actual distributions of true redshifts. Our work develops a novel method called the Contrastive Learning and Adaptive KNN for Photometric Redshift (CLAP) that resolves this issue. It leverages supervised contrastive learning (SCL) and k-nearest neighbours (KNN) to construct and calibrate raw probability density estimates, and implements a refitting procedure to resume end-to-end discriminative models ready to produce final estimates for large-scale imaging data. The harmonic mean is adopted to combine an ensemble of estimates from multiple realisations for improving accuracy. Our experiments demonstrate that CLAP takes advantage of both deep learning and KNN, outperforming benchmark methods on the calibration of probability density estimates and retaining high accuracy and computational efficiency. With reference to CLAP, we point out that miscalibration is particularly sensitive to the method-induced excessive correlations among data instances in addition to the unaccounted-for epistemic uncertainties. Reducing the uncertainties may not guarantee the removal of miscalibration due to the presence of such excessive correlations, yet this is a problem for conventional deep learning methods rather than CLAP. These discussions underscore the robustness of CLAP for obtaining photometric redshift probability densities required by astrophysical and cosmological applications. This is the first paper in our series on CLAP.

* 22 + 6 pages, 9 + 5 figures

Via

Access Paper or Ask Questions

Calibrating Deep Neural Network using Euclidean Distance

Oct 23, 2024

Wenhao Liang, Chang Dong, Liangwei Zheng, Zhengyang Li, Wei Zhang, Weitong Chen

Figure 1 for Calibrating Deep Neural Network using Euclidean Distance

Figure 2 for Calibrating Deep Neural Network using Euclidean Distance

Figure 3 for Calibrating Deep Neural Network using Euclidean Distance

Figure 4 for Calibrating Deep Neural Network using Euclidean Distance

Abstract:Uncertainty is a fundamental aspect of real-world scenarios, where perfect information is rarely available. Humans naturally develop complex internal models to navigate incomplete data and effectively respond to unforeseen or partially observed events. In machine learning, Focal Loss is commonly used to reduce misclassification rates by emphasizing hard-to-classify samples. However, it does not guarantee well-calibrated predicted probabilities and may result in models that are overconfident or underconfident. High calibration error indicates a misalignment between predicted probabilities and actual outcomes, affecting model reliability. This research introduces a novel loss function called Focal Calibration Loss (FCL), designed to improve probability calibration while retaining the advantages of Focal Loss in handling difficult samples. By minimizing the Euclidean norm through a strictly proper loss, FCL penalizes the instance-wise calibration error and constrains bounds. We provide theoretical validation for proposed method and apply it to calibrate CheXNet for potential deployment in web-based health-care systems. Extensive evaluations on various models and datasets demonstrate that our method achieves SOTA performance in both calibration and accuracy metrics.

Via

Access Paper or Ask Questions

Corrected Soft Actor Critic for Continuous Control

Oct 22, 2024

Yanjun Chen, Xinming Zhang, Xianghui Wang, Zhiqiang Xu, Xiaoyu Shen, Wei Zhang

Figure 1 for Corrected Soft Actor Critic for Continuous Control

Figure 2 for Corrected Soft Actor Critic for Continuous Control

Figure 3 for Corrected Soft Actor Critic for Continuous Control

Figure 4 for Corrected Soft Actor Critic for Continuous Control

Abstract:The Soft Actor-Critic (SAC) algorithm is known for its stability and high sample efficiency in deep reinforcement learning. However, the tanh transformation applied to sampled actions in SAC distorts the action distribution, hindering the selection of the most probable actions. This paper presents a novel action sampling method that directly identifies and selects the most probable actions within the transformed distribution, thereby addressing this issue. Extensive experiments on standard continuous control benchmarks demonstrate that the proposed method significantly enhances SAC's performance, resulting in faster convergence and higher cumulative rewards compared to the original algorithm.

Via

Access Paper or Ask Questions

MAC Revivo: Artificial Intelligence Paves the Way

Oct 21, 2024

Jinzhe Pan, Jingqing Wang, Zelin Yun, Zhiyong Xiao, Yuehui Ouyang, Wenchi Cheng, Wei Zhang

Figure 1 for MAC Revivo: Artificial Intelligence Paves the Way

Figure 2 for MAC Revivo: Artificial Intelligence Paves the Way

Figure 3 for MAC Revivo: Artificial Intelligence Paves the Way

Figure 4 for MAC Revivo: Artificial Intelligence Paves the Way

Abstract:The vast adoption of Wi-Fi and/or Bluetooth capabilities in Internet of Things (IoT) devices, along with the rapid growth of deployed smart devices, has caused significant interference and congestion in the industrial, scientific, and medical (ISM) bands. Traditional Wi-Fi Medium Access Control (MAC) design faces significant challenges in managing increasingly complex wireless environments while ensuring network Quality of Service (QoS) performance. This paper explores the potential integration of advanced Artificial Intelligence (AI) methods into the design of Wi-Fi MAC protocols. We propose AI-MAC, an innovative approach that employs machine learning algorithms to dynamically adapt to changing network conditions, optimize channel access, mitigate interference, and ensure deterministic latency. By intelligently predicting and managing interference, AI-MAC aims to provide a robust solution for next generation of Wi-Fi networks, enabling seamless connectivity and enhanced QoS. Our experimental results demonstrate that AI-MAC significantly reduces both interference and latency, paving the way for more reliable and efficient wireless communications in the increasingly crowded ISM band.

Via

Access Paper or Ask Questions

Real-time Stereo-based 3D Object Detection for Streaming Perception

Oct 16, 2024

Changcai Li, Zonghua Gu, Gang Chen, Libo Huang, Wei Zhang, Huihui Zhou

Figure 1 for Real-time Stereo-based 3D Object Detection for Streaming Perception

Figure 2 for Real-time Stereo-based 3D Object Detection for Streaming Perception

Figure 3 for Real-time Stereo-based 3D Object Detection for Streaming Perception

Figure 4 for Real-time Stereo-based 3D Object Detection for Streaming Perception

Abstract:The ability to promptly respond to environmental changes is crucial for the perception system of autonomous driving. Recently, a new task called streaming perception was proposed. It jointly evaluate the latency and accuracy into a single metric for video online perception. In this work, we introduce StreamDSGN, the first real-time stereo-based 3D object detection framework designed for streaming perception. StreamDSGN is an end-to-end framework that directly predicts the 3D properties of objects in the next moment by leveraging historical information, thereby alleviating the accuracy degradation of streaming perception. Further, StreamDSGN applies three strategies to enhance the perception accuracy: (1) A feature-flow-based fusion method, which generates a pseudo-next feature at the current moment to address the misalignment issue between feature and ground truth. (2) An extra regression loss for explicit supervision of object motion consistency in consecutive frames. (3) A large kernel backbone with a large receptive field for effectively capturing long-range spatial contextual features caused by changes in object positions. Experiments on the KITTI Tracking dataset show that, compared with the strong baseline, StreamDSGN significantly improves the streaming average precision by up to 4.33%. Our code is available at https://github.com/weiyangdaren/streamDSGN-pytorch.

* Streaming Perception, 3D Object Detection, NeurIPS2024 poster

Via

Access Paper or Ask Questions

Dual-AEB: Synergizing Rule-Based and Multimodal Large Language Models for Effective Emergency Braking

Oct 11, 2024

Wei Zhang, Pengfei Li, Junli Wang, Bingchuan Sun, Qihao Jin, Guangjun Bao, Shibo Rui, Yang Yu, Wenchao Ding, Peng Li(+1 more)

Figure 1 for Dual-AEB: Synergizing Rule-Based and Multimodal Large Language Models for Effective Emergency Braking

Figure 2 for Dual-AEB: Synergizing Rule-Based and Multimodal Large Language Models for Effective Emergency Braking

Figure 3 for Dual-AEB: Synergizing Rule-Based and Multimodal Large Language Models for Effective Emergency Braking

Figure 4 for Dual-AEB: Synergizing Rule-Based and Multimodal Large Language Models for Effective Emergency Braking

Abstract:Automatic Emergency Braking (AEB) systems are a crucial component in ensuring the safety of passengers in autonomous vehicles. Conventional AEB systems primarily rely on closed-set perception modules to recognize traffic conditions and assess collision risks. To enhance the adaptability of AEB systems in open scenarios, we propose Dual-AEB, a system combines an advanced multimodal large language model (MLLM) for comprehensive scene understanding and a conventional rule-based rapid AEB to ensure quick response times. To the best of our knowledge, Dual-AEB is the first method to incorporate MLLMs within AEB systems. Through extensive experimentation, we have validated the effectiveness of our method. The source code will be available at https://github.com/ChipsICU/Dual-AEB.

Via

Access Paper or Ask Questions

ERCache: An Efficient and Reliable Caching Framework for Large-Scale User Representations in Meta's Ads System

Oct 09, 2024

Fang Zhou, Yaning Huang, Dong Liang, Dai Li, Zhongke Zhang, Kai Wang, Xiao Xin, Abdallah Aboelela, Zheliang Jiang, Yang Wang(+14 more)

Figure 1 for ERCache: An Efficient and Reliable Caching Framework for Large-Scale User Representations in Meta's Ads System

Figure 2 for ERCache: An Efficient and Reliable Caching Framework for Large-Scale User Representations in Meta's Ads System

Figure 3 for ERCache: An Efficient and Reliable Caching Framework for Large-Scale User Representations in Meta's Ads System

Figure 4 for ERCache: An Efficient and Reliable Caching Framework for Large-Scale User Representations in Meta's Ads System

Abstract:The increasing complexity of deep learning models used for calculating user representations presents significant challenges, particularly with limited computational resources and strict service-level agreements (SLAs). Previous research efforts have focused on optimizing model inference but have overlooked a critical question: is it necessary to perform user model inference for every ad request in large-scale social networks? To address this question and these challenges, we first analyze user access patterns at Meta and find that most user model inferences occur within a short timeframe. T his observation reveals a triangular relationship among model complexity, embedding freshness, and service SLAs. Building on this insight, we designed, implemented, and evaluated ERCache, an efficient and robust caching framework for large-scale user representations in ads recommendation systems on social networks. ERCache categorizes cache into direct and failover types and applies customized settings and eviction policies for each model, effectively balancing model complexity, embedding freshness, and service SLAs, even considering the staleness introduced by caching. ERCache has been deployed at Meta for over six months, supporting more than 30 ranking models while efficiently conserving computational resources and complying with service SLA requirements.

Via

Access Paper or Ask Questions

The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models

Oct 09, 2024

Yanjun Chen, Dawei Zhu, Yirong Sun, Xinghao Chen, Wei Zhang, Xiaoyu Shen

Figure 1 for The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models

Figure 2 for The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models

Figure 3 for The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models

Figure 4 for The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models

Abstract:Reinforcement Learning from Human Feedback significantly enhances Natural Language Processing by aligning language models with human expectations. A critical factor in this alignment is the strength of reward models used during training. This study explores whether stronger reward models invariably lead to better language models. In this paper, through experiments on relevance, factuality, and completeness tasks using the QA-FEEDBACK dataset and reward models based on Longformer, we uncover a surprising paradox: language models trained with moderately accurate reward models outperform those guided by highly accurate ones. This challenges the widely held belief that stronger reward models always lead to better language models, and opens up new avenues for future research into the key factors driving model performance and how to choose the most suitable reward models. Code and additional details are available at [https://github.com/EIT-NLP/AccuracyParadox-RLHF](https://github.com/EIT-NLP/AccuracyParadox-RLHF).

* 10 pages, 27 figures (including 18 in the appendix), submitted to EMNLP 2024

Via

Access Paper or Ask Questions