Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bo Huang

Beijing StoneWise Technology Co Ltd

AttenST: A Training-Free Attention-Driven Style Transfer Framework with Pre-Trained Diffusion Models

Mar 10, 2025

Bo Huang, Wenlun Xu, Qizhuo Han, Haodong Jing, Ying Li

Abstract:While diffusion models have achieved remarkable progress in style transfer tasks, existing methods typically rely on fine-tuning or optimizing pre-trained models during inference, leading to high computational costs and challenges in balancing content preservation with style integration. To address these limitations, we introduce AttenST, a training-free attention-driven style transfer framework. Specifically, we propose a style-guided self-attention mechanism that conditions self-attention on the reference style by retaining the query of the content image while substituting its key and value with those from the style image, enabling effective style feature integration. To mitigate style information loss during inversion, we introduce a style-preserving inversion strategy that refines inversion accuracy through multiple resampling steps. Additionally, we propose a content-aware adaptive instance normalization, which integrates content statistics into the normalization process to optimize style fusion while mitigating the content degradation. Furthermore, we introduce a dual-feature cross-attention mechanism to fuse content and style features, ensuring a harmonious synthesis of structural fidelity and stylistic expression. Extensive experiments demonstrate that AttenST outperforms existing methods, achieving state-of-the-art performance in style transfer dataset.

Via

Access Paper or Ask Questions

MANet: Fine-Tuning Segment Anything Model for Multimodal Remote Sensing Semantic Segmentation

Oct 15, 2024

Xianping Ma, Xiaokang Zhang, Man-On Pun, Bo Huang

Figure 1 for MANet: Fine-Tuning Segment Anything Model for Multimodal Remote Sensing Semantic Segmentation

Figure 2 for MANet: Fine-Tuning Segment Anything Model for Multimodal Remote Sensing Semantic Segmentation

Figure 3 for MANet: Fine-Tuning Segment Anything Model for Multimodal Remote Sensing Semantic Segmentation

Figure 4 for MANet: Fine-Tuning Segment Anything Model for Multimodal Remote Sensing Semantic Segmentation

Abstract:Multimodal remote sensing data, collected from a variety of sensors, provide a comprehensive and integrated perspective of the Earth's surface. By employing multimodal fusion techniques, semantic segmentation offers more detailed insights into geographic scenes compared to single-modality approaches. Building upon recent advancements in vision foundation models, particularly the Segment Anything Model (SAM), this study introduces a novel Multimodal Adapter-based Network (MANet) for multimodal remote sensing semantic segmentation. At the core of this approach is the development of a Multimodal Adapter (MMAdapter), which fine-tunes SAM's image encoder to effectively leverage the model's general knowledge for multimodal data. In addition, a pyramid-based Deep Fusion Module (DFM) is incorporated to further integrate high-level geographic features across multiple scales before decoding. This work not only introduces a novel network for multimodal fusion, but also demonstrates, for the first time, SAM's powerful generalization capabilities with Digital Surface Model (DSM) data. Experimental results on two well-established fine-resolution multimodal remote sensing datasets, ISPRS Vaihingen and ISPRS Potsdam, confirm that the proposed MANet significantly surpasses current models in the task of multimodal semantic segmentation. The source code for this work will be accessible at https://github.com/sstary/SSRS.

* 12 pages, 9 figures

Via

Access Paper or Ask Questions

Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization

Aug 14, 2024

Yuxin Jiang, Bo Huang, Yufei Wang, Xingshan Zeng, Liangyou Li, Yasheng Wang, Xin Jiang, Lifeng Shang, Ruiming Tang, Wei Wang

Figure 1 for Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization

Figure 2 for Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization

Figure 3 for Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization

Figure 4 for Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization

Abstract:Direct preference optimization (DPO), a widely adopted offline preference optimization algorithm, aims to align large language models (LLMs) with human-desired behaviors using pairwise preference data. However, the winning response and the losing response within pairwise data are generated isolatedly, leading to weak correlations between them as well as suboptimal alignment performance. To address this issue, we propose an effective framework named BMC, for bridging and modeling correlations in pairwise data. Firstly, we increase the consistency and informativeness of the pairwise preference signals by targeted modifications, synthesizing a pseudo winning response through improving the losing response based on the winning response. Secondly, we identify that DPO alone is insufficient to model these correlations and capture nuanced variations. Therefore, we propose learning token-level correlations by dynamically leveraging the policy model's confidence during training. Comprehensive experiments on QA, math, and instruction-following tasks demonstrate the effectiveness of our approach, significantly surpassing competitive baselines, including DPO. Additionally, our in-depth quantitative analysis reveals the reasons behind our method's superior performance over DPO and showcases its versatility to other DPO variants.

* 18 pages, 8 figures, 8 tables, working in progress

Via

Access Paper or Ask Questions

Your decision path does matter in pre-training industrial recommenders with multi-source behaviors

May 27, 2024

Chunjing Gan, Binbin Hu, Bo Huang, Ziqi Liu, Jian Ma, Zhiqiang Zhang, Wenliang Zhong, Jun Zhou

Figure 1 for Your decision path does matter in pre-training industrial recommenders with multi-source behaviors

Figure 2 for Your decision path does matter in pre-training industrial recommenders with multi-source behaviors

Figure 3 for Your decision path does matter in pre-training industrial recommenders with multi-source behaviors

Figure 4 for Your decision path does matter in pre-training industrial recommenders with multi-source behaviors

Abstract:Online service platforms offering a wide range of services through miniapps have become crucial for users who visit these platforms with clear intentions to find services they are interested in. Aiming at effective content delivery, cross-domain recommendation are introduced to learn high-quality representations by transferring behaviors from data-rich scenarios. However, these methods overlook the impact of the decision path that users take when conduct behaviors, that is, users ultimately exhibit different behaviors based on various intents. To this end, we propose HIER, a novel Hierarchical decIsion path Enhanced Representation learning for cross-domain recommendation. With the help of graph neural networks for high-order topological information of the knowledge graph between multi-source behaviors, we further adaptively learn decision paths through well-designed exemplar-level and information bottleneck based contrastive learning. Extensive experiments in online and offline environments show the superiority of HIER.

Via

Access Paper or Ask Questions

Xiwu: A Basis Flexible and Learnable LLM for High Energy Physics

Apr 08, 2024

Zhengde Zhang, Yiyu Zhang, Haodong Yao, Jianwen Luo, Rui Zhao, Bo Huang, Jiameng Zhao, Yipu Liao, Ke Li, Lina Zhao(+3 more)

Abstract:Large Language Models (LLMs) are undergoing a period of rapid updates and changes, with state-of-the-art (SOTA) model frequently being replaced. When applying LLMs to a specific scientific field, it's challenging to acquire unique domain knowledge while keeping the model itself advanced. To address this challenge, a sophisticated large language model system named as Xiwu has been developed, allowing you switch between the most advanced foundation models and quickly teach the model domain knowledge. In this work, we will report on the best practices for applying LLMs in the field of high-energy physics (HEP), including: a seed fission technology is proposed and some data collection and cleaning tools are developed to quickly obtain domain AI-Ready dataset; a just-in-time learning system is implemented based on the vector store technology; an on-the-fly fine-tuning system has been developed to facilitate rapid training under a specified foundation model. The results show that Xiwu can smoothly switch between foundation models such as LLaMA, Vicuna, ChatGLM and Grok-1. The trained Xiwu model is significantly outperformed the benchmark model on the HEP knowledge question-and-answering and code generation. This strategy significantly enhances the potential for growth of our model's performance, with the hope of surpassing GPT-4 as it evolves with the development of open-source models. This work provides a customized LLM for the field of HEP, while also offering references for applying LLM to other fields, the corresponding codes are available on Github.

* 15 pages, 8 figures

Via

Access Paper or Ask Questions

LLM-Assisted Light: Leveraging Large Language Model Capabilities for Human-Mimetic Traffic Signal Control in Complex Urban Environments

Mar 13, 2024

Maonan Wang, Aoyu Pang, Yuheng Kan, Man-On Pun, Chung Shue Chen, Bo Huang

Figure 1 for LLM-Assisted Light: Leveraging Large Language Model Capabilities for Human-Mimetic Traffic Signal Control in Complex Urban Environments

Figure 2 for LLM-Assisted Light: Leveraging Large Language Model Capabilities for Human-Mimetic Traffic Signal Control in Complex Urban Environments

Figure 3 for LLM-Assisted Light: Leveraging Large Language Model Capabilities for Human-Mimetic Traffic Signal Control in Complex Urban Environments

Figure 4 for LLM-Assisted Light: Leveraging Large Language Model Capabilities for Human-Mimetic Traffic Signal Control in Complex Urban Environments

Abstract:Traffic congestion in metropolitan areas presents a formidable challenge with far-reaching economic, environmental, and societal ramifications. Therefore, effective congestion management is imperative, with traffic signal control (TSC) systems being pivotal in this endeavor. Conventional TSC systems, designed upon rule-based algorithms or reinforcement learning (RL), frequently exhibit deficiencies in managing the complexities and variabilities of urban traffic flows, constrained by their limited capacity for adaptation to unfamiliar scenarios. In response to these limitations, this work introduces an innovative approach that integrates Large Language Models (LLMs) into TSC, harnessing their advanced reasoning and decision-making faculties. Specifically, a hybrid framework that augments LLMs with a suite of perception and decision-making tools is proposed, facilitating the interrogation of both the static and dynamic traffic information. This design places the LLM at the center of the decision-making process, combining external traffic data with established TSC methods. Moreover, a simulation platform is developed to corroborate the efficacy of the proposed framework. The findings from our simulations attest to the system's adeptness in adjusting to a multiplicity of traffic environments without the need for additional training. Notably, in cases of Sensor Outage (SO), our approach surpasses conventional RL-based systems by reducing the average waiting time by $20.4\%$. This research signifies a notable advance in TSC strategies and paves the way for the integration of LLMs into real-world, dynamic scenarios, highlighting their potential to revolutionize traffic management. The related code is available at \href{https://github.com/Traffic-Alpha/LLM-Assisted-Light}{https://github.com/Traffic-Alpha/LLM-Assisted-Light}.

* 15 pages

Via

Access Paper or Ask Questions

MetaSeg: Content-Aware Meta-Net for Omni-Supervised Semantic Segmentation

Jan 22, 2024

Shenwang Jiang, Jianan Li, Ying Wang, Wenxuan Wu, Jizhou Zhang, Bo Huang, Tingfa Xu

Figure 1 for MetaSeg: Content-Aware Meta-Net for Omni-Supervised Semantic Segmentation

Figure 2 for MetaSeg: Content-Aware Meta-Net for Omni-Supervised Semantic Segmentation

Figure 3 for MetaSeg: Content-Aware Meta-Net for Omni-Supervised Semantic Segmentation

Figure 4 for MetaSeg: Content-Aware Meta-Net for Omni-Supervised Semantic Segmentation

Abstract:Noisy labels, inevitably existing in pseudo segmentation labels generated from weak object-level annotations, severely hampers model optimization for semantic segmentation. Previous works often rely on massive hand-crafted losses and carefully-tuned hyper-parameters to resist noise, suffering poor generalization capability and high model complexity. Inspired by recent advances in meta learning, we argue that rather than struggling to tolerate noise hidden behind clean labels passively, a more feasible solution would be to find out the noisy regions actively, so as to simply ignore them during model optimization. With this in mind, this work presents a novel meta learning based semantic segmentation method, MetaSeg, that comprises a primary content-aware meta-net (CAM-Net) to sever as a noise indicator for an arbitrary segmentation model counterpart. Specifically, CAM-Net learns to generate pixel-wise weights to suppress noisy regions with incorrect pseudo labels while highlighting clean ones by exploiting hybrid strengthened features from image content, providing straightforward and reliable guidance for optimizing the segmentation model. Moreover, to break the barrier of time-consuming training when applying meta learning to common large segmentation models, we further present a new decoupled training strategy that optimizes different model layers in a divide-and-conquer manner. Extensive experiments on object, medical, remote sensing and human segmentation shows that our method achieves superior performance, approaching that of fully supervised settings, which paves a new promising way for omni-supervised semantic segmentation.

Via

Access Paper or Ask Questions

Dataset Distillation via Adversarial Prediction Matching

Dec 14, 2023

Mingyang Chen, Bo Huang, Junda Lu, Bing Li, Yi Wang, Minhao Cheng, Wei Wang

Figure 1 for Dataset Distillation via Adversarial Prediction Matching

Figure 2 for Dataset Distillation via Adversarial Prediction Matching

Figure 3 for Dataset Distillation via Adversarial Prediction Matching

Figure 4 for Dataset Distillation via Adversarial Prediction Matching

Abstract:Dataset distillation is the technique of synthesizing smaller condensed datasets from large original datasets while retaining necessary information to persist the effect. In this paper, we approach the dataset distillation problem from a novel perspective: we regard minimizing the prediction discrepancy on the real data distribution between models, which are respectively trained on the large original dataset and on the small distilled dataset, as a conduit for condensing information from the raw data into the distilled version. An adversarial framework is proposed to solve the problem efficiently. In contrast to existing distillation methods involving nested optimization or long-range gradient unrolling, our approach hinges on single-level optimization. This ensures the memory efficiency of our method and provides a flexible tradeoff between time and memory budgets, allowing us to distil ImageNet-1K using a minimum of only 6.5GB of GPU memory. Under the optimal tradeoff strategy, it requires only 2.5$\times$ less memory and 5$\times$ less runtime compared to the state-of-the-art. Empirically, our method can produce synthetic datasets just 10% the size of the original, yet achieve, on average, 94% of the test accuracy of models trained on the full original datasets including ImageNet-1K, significantly surpassing state-of-the-art. Additionally, extensive tests reveal that our distilled datasets excel in cross-architecture generalization capabilities.

Via

Access Paper or Ask Questions

SAM-Assisted Remote Sensing Imagery Semantic Segmentation with Object and Boundary Constraints

Dec 05, 2023

Xianping Ma, Qianqian Wu, Xingyu Zhao, Xiaokang Zhang, Man-On Pun, Bo Huang

Figure 1 for SAM-Assisted Remote Sensing Imagery Semantic Segmentation with Object and Boundary Constraints

Figure 2 for SAM-Assisted Remote Sensing Imagery Semantic Segmentation with Object and Boundary Constraints

Figure 3 for SAM-Assisted Remote Sensing Imagery Semantic Segmentation with Object and Boundary Constraints

Figure 4 for SAM-Assisted Remote Sensing Imagery Semantic Segmentation with Object and Boundary Constraints

Abstract:Semantic segmentation of remote sensing imagery plays a pivotal role in extracting precise information for diverse down-stream applications. Recent development of the Segment Anything Model (SAM), an advanced general-purpose segmentation model, has revolutionized this field, presenting new avenues for accurate and efficient segmentation. However, SAM is limited to generating segmentation results without class information. Consequently, the utilization of such a powerful general vision model for semantic segmentation in remote sensing images has become a focal point of research. In this paper, we present a streamlined framework aimed at leveraging the raw output of SAM by exploiting two novel concepts called SAM-Generated Object (SGO) and SAM-Generated Boundary (SGB). More specifically, we propose a novel object loss and further introduce a boundary loss as augmentative components to aid in model optimization in a general semantic segmentation framework. Taking into account the content characteristics of SGO, we introduce the concept of object consistency to leverage segmented regions lacking semantic information. By imposing constraints on the consistency of predicted values within objects, the object loss aims to enhance semantic segmentation performance. Furthermore, the boundary loss capitalizes on the distinctive features of SGB by directing the model's attention to the boundary information of the object. Experimental results on two well-known datasets, namely ISPRS Vaihingen and LoveDA Urban, demonstrate the effectiveness of our proposed method. The source code for this work will be accessible at https://github.com/sstary/SSRS.

* 10 pages, 4 figures

Via

Access Paper or Ask Questions

PEACE: Prototype lEarning Augmented transferable framework for Cross-domain rEcommendation

Dec 04, 2023

Chunjing Gan, Bo Huang, Binbin Hu, Jian Ma, Ziqi Liu, Zhiqiang Zhang, Jun Zhou, Guannan Zhang, Wenliang Zhong

Figure 1 for PEACE: Prototype lEarning Augmented transferable framework for Cross-domain rEcommendation

Figure 2 for PEACE: Prototype lEarning Augmented transferable framework for Cross-domain rEcommendation

Figure 3 for PEACE: Prototype lEarning Augmented transferable framework for Cross-domain rEcommendation

Figure 4 for PEACE: Prototype lEarning Augmented transferable framework for Cross-domain rEcommendation

Abstract:To help merchants/customers to provide/access a variety of services through miniapps, online service platforms have occupied a critical position in the effective content delivery, in which how to recommend items in the new domain launched by the service provider for customers has become more urgent. However, the non-negligible gap between the source and diversified target domains poses a considerable challenge to cross-domain recommendation systems, which often leads to performance bottlenecks in industrial settings. While entity graphs have the potential to serve as a bridge between domains, rudimentary utilization still fail to distill useful knowledge and even induce the negative transfer issue. To this end, we propose PEACE, a Prototype lEarning Augmented transferable framework for Cross-domain rEcommendation. For domain gap bridging, PEACE is built upon a multi-interest and entity-oriented pre-training architecture which could not only benefit the learning of generalized knowledge in a multi-granularity manner, but also help leverage more structural information in the entity graph. Then, we bring the prototype learning into the pre-training over source domains, so that representations of users and items are greatly improved by the contrastive prototype learning module and the prototype enhanced attention mechanism for adaptive knowledge utilization. To ease the pressure of online serving, PEACE is carefully deployed in a lightweight manner, and significant performance improvements are observed in both online and offline environments.

* Accepted by WSDM 2024

Via

Access Paper or Ask Questions