Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Haitao Wang

One Model to Rank Them All: Unifying Online Advertising with End-to-End Learning

May 26, 2025

Junyan Qiu, Ze Wang, Fan Zhang, Zuowu Zheng, Jile Zhu, Jiangke Fan, Teng Zhang, Haitao Wang, Xingxing Wang

Abstract:Modern industrial advertising systems commonly employ Multi-stage Cascading Architectures (MCA) to balance computational efficiency with ranking accuracy. However, this approach presents two fundamental challenges: (1) performance inconsistencies arising from divergent optimization targets and capability differences between stages, and (2) failure to account for advertisement externalities - the complex interactions between candidate ads during ranking. These limitations ultimately compromise system effectiveness and reduce platform profitability. In this paper, we present UniROM, an end-to-end generative architecture that Unifies online advertising Ranking as One Model. UniROM replaces cascaded stages with a single model to directly generate optimal ad sequences from the full candidate ad corpus in location-based services (LBS). The primary challenges associated with this approach stem from high costs of feature processing and computational bottlenecks in modeling externalities of large-scale candidate pools. To address these challenges, UniROM introduces an algorithm and engine co-designed hybrid feature service to decouple user and ad feature processing, reducing latency while preserving expressiveness. To efficiently extract intra- and cross-sequence mutual information, we propose RecFormer with an innovative cluster-attention mechanism as its core architectural component. Furthermore, we propose a bi-stage training strategy that integrates pre-training with reinforcement learning-based post-training to meet sophisticated platform and advertising objectives. Extensive offline evaluations on public benchmarks and large-scale online A/B testing on industrial advertising platform have demonstrated the superior performance of UniROM over state-of-the-art MCAs.

Via

Access Paper or Ask Questions

Improving Out-of-Domain Robustness with Targeted Augmentation in Frequency and Pixel Spaces

May 18, 2025

Ruoqi Wang, Haitao Wang, Shaojie Guo, Qiong Luo

Abstract:Out-of-domain (OOD) robustness under domain adaptation settings, where labeled source data and unlabeled target data come from different distributions, is a key challenge in real-world applications. A common approach to improving OOD robustness is through data augmentations. However, in real-world scenarios, models trained with generic augmentations can only improve marginally when generalized under distribution shifts toward unlabeled target domains. While dataset-specific targeted augmentations can address this issue, they typically require expert knowledge and extensive prior data analysis to identify the nature of the datasets and domain shift. To address these challenges, we propose Frequency-Pixel Connect, a domain-adaptation framework that enhances OOD robustness by introducing a targeted augmentation in both the frequency space and pixel space. Specifically, we mix the amplitude spectrum and pixel content of a source image and a target image to generate augmented samples that introduce domain diversity while preserving the semantic structure of the source image. Unlike previous targeted augmentation methods that are both dataset-specific and limited to the pixel space, Frequency-Pixel Connect is dataset-agnostic, enabling broader and more flexible applicability beyond natural image datasets. We further analyze the effectiveness of Frequency-Pixel Connect by evaluating the performance of our method connecting same-class cross-domain samples while separating different-class examples. We demonstrate that Frequency-Pixel Connect significantly improves cross-domain connectivity and outperforms previous generic methods on four diverse real-world benchmarks across vision, medical, audio, and astronomical domains, and it also outperforms other dataset-specific targeted augmentation methods.

Via

Access Paper or Ask Questions

A Cognitive-Mechanistic Human Reliability Analysis Framework: A Nuclear Power Plant Case Study

Apr 25, 2025

Xingyu Xiao, Peng Chen, Jiejuan Tong, Shunshun Liu, Hongru Zhao, Jun Zhao, Qianqian Jia, Jingang Liang, Haitao Wang

Abstract:Traditional human reliability analysis (HRA) methods, such as IDHEAS-ECA, rely on expert judgment and empirical rules that often overlook the cognitive underpinnings of human error. Moreover, conducting human-in-the-loop experiments for advanced nuclear power plants is increasingly impractical due to novel interfaces and limited operational data. This study proposes a cognitive-mechanistic framework (COGMIF) that enhances the IDHEAS-ECA methodology by integrating an ACT-R-based human digital twin (HDT) with TimeGAN-augmented simulation. The ACT-R model simulates operator cognition, including memory retrieval, goal-directed procedural reasoning, and perceptual-motor execution, under high-fidelity scenarios derived from a high-temperature gas-cooled reactor (HTGR) simulator. To overcome the resource constraints of large-scale cognitive modeling, TimeGAN is trained on ACT-R-generated time-series data to produce high-fidelity synthetic operator behavior datasets. These simulations are then used to drive IDHEAS-ECA assessments, enabling scalable, mechanism-informed estimation of human error probabilities (HEPs). Comparative analyses with SPAR-H and sensitivity assessments demonstrate the robustness and practical advantages of the proposed COGMIF. Finally, procedural features are mapped onto a Bayesian network to quantify the influence of contributing factors, revealing key drivers of operational risk. This work offers a credible and computationally efficient pathway to integrate cognitive theory into industrial HRA practices.

Via

Access Paper or Ask Questions

TDFANet: Encoding Sequential 4D Radar Point Clouds Using Trajectory-Guided Deformable Feature Aggregation for Place Recognition

Apr 07, 2025

Shouyi Lu, Guirong Zhuo, Haitao Wang, Quan Zhou, Huanyu Zhou, Renbo Huang, Minqing Huang, Lianqing Zheng, Qiang Shu

Abstract:Place recognition is essential for achieving closed-loop or global positioning in autonomous vehicles and mobile robots. Despite recent advancements in place recognition using 2D cameras or 3D LiDAR, it remains to be seen how to use 4D radar for place recognition - an increasingly popular sensor for its robustness against adverse weather and lighting conditions. Compared to LiDAR point clouds, radar data are drastically sparser, noisier and in much lower resolution, which hampers their ability to effectively represent scenes, posing significant challenges for 4D radar-based place recognition. This work addresses these challenges by leveraging multi-modal information from sequential 4D radar scans and effectively extracting and aggregating spatio-temporal features.Our approach follows a principled pipeline that comprises (1) dynamic points removal and ego-velocity estimation from velocity property, (2) bird's eye view (BEV) feature encoding on the refined point cloud, (3) feature alignment using BEV feature map motion trajectory calculated by ego-velocity, (4) multi-scale spatio-temporal features of the aligned BEV feature maps are extracted and aggregated.Real-world experimental results validate the feasibility of the proposed method and demonstrate its robustness in handling dynamic environments. Source codes are available.

* 8 pages, 4 figures. Accepted to ICRA 2025

Via

Access Paper or Ask Questions

KRAIL: A Knowledge-Driven Framework for Base Human Reliability Analysis Integrating IDHEAS and Large Language Models

Dec 20, 2024

Xingyu Xiao, Peng Chen, Ben Qi, Hongru Zhao, Jingang Liang, Jiejuan Tong, Haitao Wang

Abstract:Human reliability analysis (HRA) is crucial for evaluating and improving the safety of complex systems. Recent efforts have focused on estimating human error probability (HEP), but existing methods often rely heavily on expert knowledge,which can be subjective and time-consuming. Inspired by the success of large language models (LLMs) in natural language processing, this paper introduces a novel two-stage framework for knowledge-driven reliability analysis, integrating IDHEAS and LLMs (KRAIL). This innovative framework enables the semi-automated computation of base HEP values. Additionally, knowledge graphs are utilized as a form of retrieval-augmented generation (RAG) for enhancing the framework' s capability to retrieve and process relevant data efficiently. Experiments are systematically conducted and evaluated on authoritative datasets of human reliability. The experimental results of the proposed methodology demonstrate its superior performance on base HEP estimation under partial information for reliability assessment.

Via

Access Paper or Ask Questions

Effective Fine-Tuning of Vision-Language Models for Accurate Galaxy Morphology Analysis

Nov 29, 2024

Ruoqi Wang, Haitao Wang, Qiong Luo

Abstract:Galaxy morphology analysis involves classifying galaxies by their shapes and structures. For this task, directly training domain-specific models on large, annotated astronomical datasets is effective but costly. In contrast, fine-tuning vision foundation models on a smaller set of astronomical images is more resource-efficient but generally results in lower accuracy. To harness the benefits of both approaches and address their shortcomings, we propose GalaxAlign, a novel method that fine-tunes pre-trained foundation models to achieve high accuracy on astronomical tasks. Specifically, our method extends a contrastive learning architecture to align three types of data in fine-tuning: (1) a set of schematic symbols representing galaxy shapes and structures, (2) textual labels of these symbols, and (3) galaxy images. This way, GalaxAlign not only eliminates the need for expensive pretraining but also enhances the effectiveness of fine-tuning. Extensive experiments on galaxy classification and similarity search demonstrate that our method effectively fine-tunes general pre-trained models for astronomical tasks by incorporating domain-specific multi-modal knowledge.

Via

Access Paper or Ask Questions

VisRec: A Semi-Supervised Approach to Radio Interferometric Data Reconstruction

Mar 01, 2024

Ruoqi Wang, Haitao Wang, Qiong Luo, Feng Wang, Hejun Wu

Abstract:Radio telescopes produce visibility data about celestial objects, but these data are sparse and noisy. As a result, images created on raw visibility data are of low quality. Recent studies have used deep learning models to reconstruct visibility data to get cleaner images. However, these methods rely on a substantial amount of labeled training data, which requires significant labeling effort from radio astronomers. Addressing this challenge, we propose VisRec, a model-agnostic semi-supervised learning approach to the reconstruction of visibility data. Specifically, VisRec consists of both a supervised learning module and an unsupervised learning module. In the supervised learning module, we introduce a set of data augmentation functions to produce diverse training examples. In comparison, the unsupervised learning module in VisRec augments unlabeled data and uses reconstructions from non-augmented visibility data as pseudo-labels for training. This hybrid approach allows VisRec to effectively leverage both labeled and unlabeled data. This way, VisRec performs well even when labeled data is scarce. Our evaluation results show that VisRec outperforms all baseline methods in reconstruction quality, robustness against common observation perturbation, and generalizability to different telescope configurations.

Via

Access Paper or Ask Questions

ControlRec: Bridging the Semantic Gap between Language Model and Personalized Recommendation

Nov 28, 2023

Junyan Qiu, Haitao Wang, Zhaolin Hong, Yiping Yang, Qiang Liu, Xingxing Wang

Figure 1 for ControlRec: Bridging the Semantic Gap between Language Model and Personalized Recommendation

Figure 2 for ControlRec: Bridging the Semantic Gap between Language Model and Personalized Recommendation

Figure 3 for ControlRec: Bridging the Semantic Gap between Language Model and Personalized Recommendation

Figure 4 for ControlRec: Bridging the Semantic Gap between Language Model and Personalized Recommendation

Abstract:The successful integration of large language models (LLMs) into recommendation systems has proven to be a major breakthrough in recent studies, paving the way for more generic and transferable recommendations. However, LLMs struggle to effectively utilize user and item IDs, which are crucial identifiers for successful recommendations. This is mainly due to their distinct representation in a semantic space that is different from the natural language (NL) typically used to train LLMs. To tackle such issue, we introduce ControlRec, an innovative Contrastive prompt learning framework for Recommendation systems. ControlRec treats user IDs and NL as heterogeneous features and encodes them individually. To promote greater alignment and integration between them in the semantic space, we have devised two auxiliary contrastive objectives: (1) Heterogeneous Feature Matching (HFM) aligning item description with the corresponding ID or user's next preferred ID based on their interaction sequence, and (2) Instruction Contrastive Learning (ICL) effectively merging these two crucial data sources by contrasting probability distributions of output sequences generated by diverse tasks. Experimental results on four public real-world datasets demonstrate the effectiveness of the proposed method on improving model performance.

* 11 pages, 7 figures

Via

Access Paper or Ask Questions

AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes

Aug 25, 2023

Zhaohui Li, Haitao Wang, Xinghua Jiang

Figure 1 for AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes

Figure 2 for AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes

Figure 3 for AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes

Figure 4 for AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes

Abstract:We propose a method named AudioFormer,which learns audio feature representations through the acquisition of discrete acoustic codes and subsequently fine-tunes them for audio classification tasks. Initially,we introduce a novel perspective by considering the audio classification task as a form of natural language understanding (NLU). Leveraging an existing neural audio codec model,we generate discrete acoustic codes and utilize them to train a masked language model (MLM),thereby obtaining audio feature representations. Furthermore,we pioneer the integration of a Multi-Positive sample Contrastive (MPC) learning approach. This method enables the learning of joint representations among multiple discrete acoustic codes within the same audio input. In our experiments,we treat discrete acoustic codes as textual data and train a masked language model using a cloze-like methodology,ultimately deriving high-quality audio representations. Notably,the MPC learning technique effectively captures collaborative representations among distinct positive samples. Our research outcomes demonstrate that AudioFormer attains significantly improved performance compared to prevailing monomodal audio classification models across multiple datasets,and even outperforms audio-visual multimodal classification models on select datasets. Specifically,our approach achieves remarkable results on datasets including AudioSet (2M,20K),and FSD50K,with performance scores of 53.9,45.1,and 65.6,respectively. We have openly shared both the code and models: https://github.com/LZH-0225/AudioFormer.git.

* Need to supplement more detailed experiments

Via

Access Paper or Ask Questions

MSMix:An Interpolation-Based Text Data Augmentation Method Manifold Swap Mixup

May 31, 2023

Mao Ye, Haitao Wang, Zheqian Chen

Abstract:To solve the problem of poor performance of deep neural network models due to insufficient data, a simple yet effective interpolation-based data augmentation method is proposed: MSMix (Manifold Swap Mixup). This method feeds two different samples to the same deep neural network model, and then randomly select a specific layer and partially replace hidden features at that layer of one of the samples by the counterpart of the other. The mixed hidden features are fed to the model and go through the rest of the network. Two different selection strategies are also proposed to obtain richer hidden representation. Experiments are conducted on three Chinese intention recognition datasets, and the results show that the MSMix method achieves better results than other methods in both full-sample and small-sample configurations.

Via

Access Paper or Ask Questions